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(57) Abstract 

The present invention provides crystallized pro- 
tective protein/cathepsin A (PPCA), a precursor thereof 
(pPPCA) or at least one subdomain thereof; methods 
for x-ray diffraction analysts to provide x-ray diffrac- 
tion patterns of sufficiently high resolution for three- 
dimensional structure determination of the protein, as 
well as methods for rational drug design (RDD). based 
on using amino acid sequence data and/or x-ray crys- 
tallography data provided on computer readable media, 
as analyzed on a computer system having suitable com- 
puter algorithms. 
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WO 97/15588 PCT/US96/17325 

Protective Protein/Cathepsin A and Precursor: Crystallization, X-Ray Diffraction, Three- 
Dimensional Structure Determination and Rational Drug Design 

Background of the Invention 

Statement as to Rights to inventions Made Under 
Federally-Sponsored Research and Development 

Pan of the work performed during development of this invention utilized U.S. Government funds. The U.S. 
Government has certain rights in this invention. 
Field of the Invention 

The present invention is in the fields of molecular biology, protein purification, protein crystallization, x-ray 
diffraction analysis, three-dimensional structure determination and rational drug design (RDD). The present invention 
provides crystallized protective protein/cathepsin A (PPCA) and its precursor (pPPCA). The crystallized PPCA or 
pPPCA is analyzed by x-ray diffraction techniques. The resulting x-ray diffraction patterns are of sufficiently high 
resolution to be useful for determining the three-dimensional structure of the PPCA or pPPCA protein, and for RDD. 
15 Related Background Art 

The human protective protein/cathepsin A (PPCA, also known as human protective protein or HPP) has been 
identified as the primary genetic defect underlying galactosialidosis (<TAzzo et «/., /Vac. NaiL Acad Sci. US A. 7P:4535- 
4539 (1982)), a lysosomal storage disease inherited as an autosomal recessive trait. Patients with this disorder are 
diagnosed as having drastically reduced P-galactosidase and neuraminidase activities in their cell lysosomes. Examples 
20 of lysosomal storage diseases are presented in Table 3 1 6- 1 of Braunwald et ai. eds.. Harrison s Principles of internal 
Medicine. I Ith Ed., pp. 1661-1671, McGraw Hill Book Co., New York (1987); as well as Wenger et aL Biochem. 
Biophys. Res. Commun 52:589-595 (1978); Tettamanti et ai eds. t Sialidases and Sialidosis. Perspectives in Inherited 
Metabolic Diseases. Vol. 4. Edi. Ermes, Milano (1981), pp. 261-279 and 379-395; and van Diggeien et ai Lancet 
2:804(1987), which references are entirely incorporated herein by reference.. 
25 Researchers have proposed that one of PPCA *s functions is to stabilize p-galactosidase and neuraminidase in 

a multi-enzyme complex, which complex is deficient in galactosialidosis patients (tiAzzoetal. (1982). infra: Hoogeveen 
etal. (1983/ infra). Evidence for this protective function comes from studies showing that PPCA is taken up from the 
culture medium by galactosialidosis fibroblasts and that PPCA restores both P-galactosidase and neuraminidase activities 
to these fibroblasts (d'Azzo et ai (1982/ infra). 
30 The cDNA for PPCA directs the synthesis of a 452 amino acid precursor PPCA (pPPCA) (Figure 1 3) with a 

molecular weight of 54 kDa (Galjart et ai. Cell 54:755-764 (1988)). The amino acid sequences of PPCA (Figure 14) 
and pPPCA (Figure 13) contain two glycosylate sites (Asn 117 and Asn 305), both of which- are glycosylated in 
cultured fibroblasts and cells over-expressing PPCA or pPPCA. pPPCA dimerizes soon after synthesis in the 
endoplasmic reticulum (ER) (Zhou et ai. EMBOJ. / 0:404-4048 (1991)). 
35 Lysosomal PPCA has cathepsin A/deamidase/esterase activities which are exerted in vitro on a specific subset 

of bioactive peptides. Non-limiting examples of those hydrolyzed by PPCA are: substance P and substance P-free acid; 
oxytocin and oxytocin-free acid; neurokinin A: angiotensin I; bradykinin (Jackman infra. (1990). Furthermore, the 
enzyme inactivates endothelin I activity in rat smooth muscle cells and normal human tissues. This activity was deficient 
in liver from a galactosialidosis patient (Itoh, infra, 1995; Jackman etal. J. Biol Chem. 2(57:2872-2875, (1992). 
40 Endothelins (ET- 1 , ET-2 and ET-3 ) are potent vasoconstrictors and elevate blood pressure in mammals, they 

also intluence cell proliferation and hormone production and have been implicated in cardiovascular disorders, ranging 
from hypertension to stroke to ischemic heart disease (Rubanyi and Polokoff. Pharmc.Rev. 45:325-4 1 5 (1994)). 

The three-dimensional structure of a PPCA or a pPPCA has not previously been published, which structure 
could delineate specific biological activities and ligands as therapeutics for PPCA-related pathologies. Accordingly, 
45 there is a need to provide three-dimensional structures of at least one PPCA, pPPCA or ligands for diagnosis or therapy 
of PPCA-related pathologies. 
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Summary of the Invention 
The presem invention provides methods of expressing, purifying and crystallizing a human protective 
protein/cathepsin A (PPCA) and its precursor, precursor protective protein/cathepsin A (pPPCA). The present invention 
also provides methods for obtaining crystallized PPCA or pPPCA that can be analyzed to obtain x-ray diffraction 
5 patterns of sufficiently high resolution to be useful for three-dimensional structure determination of the protein. 

The x-ray diffraction patterns can be either analyzed directly to provide the three dimensional structure (if of 
sufficiently by high resolution), or atomic coordinates for the crystallized PPCA or pPPCA, as provided herein, can be 
used for structure determination. The x-ray pattem/diffraction patterns obtained by methods of the present invention, 
and provided on computer readable media, are used to provide electron density maps. The amino acid sequence is also 
10 useful for three-dimensional structure determination. The data is then used in combination with phase determination 
(e.g., using multiple isomorphous replacement (MIR) molecular replacement techniques) to generate electron density 
maps of a PPCA or a pPPCA, using a suitable computer system. 

The electron density maps, provided by analysis of either the x-ray diffraction patterns or working backwards 
from the atomic coordinates, provided herein, are then fitted using suitable computer algorithms to generate secondary, 
15 tertiary and/or quaternary domains of a PPCA or a pPPCA, which domains are then used to provide an overall three- 
dimensional structure, as well as expected binding and active sites of the PPCA or pPPCA. pPPCA has some of the 
active and binding sites of PPCA , except for changes in structure due to the presence of the portion of the pPPCA which 
is deleted during maturation to PPCA (e.g., residues 285-298 of Figure 13). 

Structure determination methods and computer systems are also provided by the present invention for rational 
20 drug design (RDD). These RDD methods use computer modeling programs to find potential ligands that are calculated 
to associate with, or bind to, sites or domains of a PPCA or a pPPCA. Potential ligands are then screened for modulating 
or binding activity. Such screening methods can be selected from assays for at least one PPCA-specific structural feature 
or biological activity, preferably as associated with a PPCA- or pPPCA-related pathology, e.g., protective activity (eg., 
modulation of p-galactosidase activity and neuraminidase (NA) activity); and peptide or enzyme modulating activity 
25 (e.g., of endothelin I (serine carboxypeptidase), neuropeptides, cathepsin A, and the like), according to known assays. 
The resulting ligands provided by methods of the present invention are synthesized and are useful for treating, inhibiting 
or preventing at least one of PPCA related pathology in a mammal. 

Other objects of the invention will be apparent to one of ordinary skill in the art from the following detailed 
description and examples relating to the present invention. 
30 Brief Description of the Figures 

Figure 1: is a schematic ribbon diagram of the PPCA monomer (monomer 1), where Secondary structure 
assignments are according to DSSP (Kabsch and Sander, Biopolymers 22:2577-2637 (1983)). The 'core' domain is 
shown in yellow. The •cap' domain consists of a 'helical' subdomain. in red, and a 'maturation* subdomain, in orange. 
The catalytic triad Ser 150, His 429 and Asp 372 (from right to left) is shown by small green spheres. (Figure generated 
35 using MOLSCRIPT (Kraulis, J. Appi Cryst. 24:946-950 (1991))). 

Figure 2 is stereo diagram is presented of the C". trace of the PPCA monomer 1 with numbering of selected 
residues. The residues forming the a-helices and P-strands are as follows according to DSSP: 

Core domain: CPI (21-27): Cp2(32-39): Cp3(50-54): Cal(63-67) Cp4(73-75); Cp5(82-84); Cp6(94-98); 
Co2(l 18-135): Cp7(l44-149): Ca3(152-I63): CP8(17I-177): Ca4<307-313): Co5(316-321); Co6{336-34 1 ); Ca7(350- 
40 359): Cp9(363-369): Co8(377-386): Cp 10(39 1-40 1): Cpl K407-4J6): Cpl 2(4 1 9-424): Ca9(43 1-434); Ca 10(436-447): 
Capdomain: Hal(183-I96): Ha2(202-212); Ho3(226-240): Mp 1(26 1-264): MP2(267-270): Ma I (290-293); 
Mp3(296-299). Note that for monomer 2 the secondary structure assignments in the cap domain are slightly different 
than in monomer I. Residues in Hpi are in a region of poor density and Mai is an extended coil. (Figure generated 
using MOLSCRIPT (Kraulis (1991 ), infra). 
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Figure 3 shows the density for the disulfide bridges Cys 212-Cys 228 and Cys 213-Cys 218 is presented as 
revealed in the SigmaA weighted 2mF 0 -DF c electron density map (Read, Acta Crystallogr. A 42:140-149 (1986)) 
calculated from the model refined to 2.2 A; the map has been contoured at lo. (Figure drawn with the O computer 
program (Jones, Acta Crystaliogr. A47\\ 10-1 19 (1991))). 

Figure 4 is stereo diagram is presented of the superimposed C« traces from the two crystal lographically 
independent PPCA monomers forming the dimer. Monomer 1 is in blue, monomer 2 is in red. Residues referred to in 
the text are labeled. Residues 259 and 260 have not been incorporated in the model of monomer 2. since no electron 
density was observed for them. Note the tremendous difference in conformation of the excision peptide located in the 
upper right corner of the proteins. (Figure generated by MOLSCRIPT (Kraulis (1991), infra)). 

Figure 5 is a schematic ribbon diagram is presented of the PPCA dimer viewed approximately along the two- 
fold axis. For monomer 1, the core domain is yellow while the cap domain consists of a helical subdomain in red and 
a maturation subdomain in orange. For monomer 2, the core domain is green, while the cap domain consists of a blue 
helical subdomain and a light blue maturation subdomain. (Figure generated using MOLSCRIPT (Kraulis (1991 ), infra)). 
Figure 6A-B is a representation of the molecular surface of the PPCA dimer. The surface was calculated with 
15 GRASP (Nicholls, A., etai. Proteins 77:281-296 (1991)) and colored according to the electrostatic potential. Dark blue 
corresponds to positive potential > + 1 5.0 kT/e and dark red to a negative <- 1 5.0 kT/e potential. Figure 6A: standard 
view, along the diad with the dimer oriented as in Figure 4. Figure 6B: side view of the dimer, ninety degrees rotated 
with respect to 6A. 

Figure 7A-F presents a topological comparison of 6 members of the hydrolase fold family. The arrangement 

20 of structural elements in the central core domain (in green and yellow) of the different proteins is generally similar. Tne 
cap domains (in red) vary greatly. The following structures are shown starting from the top left hand corner (references 
and PDB entry codes are given in between brackets): Figure 7A shows the PPCA precursor cap domain that consists of 
two subdomains one a-heiical and the other mainly f*-sheet; Figure 7B shows CPW (3SC2, Liao etal. (1 992) infra), cap 
domain helical; Figure 7C shows CPY (LYSC, Endrizzi ei al (1994), infra\ cap domain helical; Figure 7D shows 

25 dehalogenase (2HAD, Franken et al., J. EMBO 70:1297-1302 (1991)), cap domain helical but quite different from the 
serine carboxypeptidases; Figure 7E shows lipase from Pseudomonas glumae ( 1 TAH, Noble et al. FEBS Lett. 33 7 : 1 23- 
128 (1993)), cap domain mixed o-helical and (J-strands; and Figure 7F shows acetylcholine esterase (1 ACE, Sussman 
et al.. Science 253: 872-879 (1991)). cap domain large and predominantly a-helical. The secondary structure 
assignments were generated with the computer program O, using structures provided and/or available from the 

30 Brookhaven Protein Data Bank. (This Figure was generated using MOLSCRIPT (Kraulis ( 1 99 1 ), infra)). 

Figure 8A-B shows the superposition of the C a traces from the PPCA and CPW monomers, showing that the 
major differences between the two enzymes are localized in the cap domain. PPCA has a large 'maturation* subdomain 
and the 'helical subdomain' is rotated with respect to the CPW counterpart (Figure drawn with the O program (Jones 
(1991), infra)). Figure 8B shows the C traces from the PPCA and CPW dimers after the core domains from the subunits 

35 (shown on the right hand side of the two dimers) have been superimposed. Notice the remarkable difference in mutual 
orientation (of 1 5°) of the two subunits on the left hand side of the two dimers, which has been accentuated by an arrow. 
(Figure drawn with the O computer program (Jones (1991 ), supra)). 

Figure 9 is a stereo view of the Ca trace of PPCA monomer I highlighting regions involved in the maturation 
event. Color scheme for the trace is as follows: core domain in light blue, helical subdomain in red. maturation 

40 subdomain in orange with the exception of the excision peptide (residues 285-298) which is shown in blue. Orange 
sphere mark the residues 272 and 277 marking the beginning and end of the blocking peptide. The catalytic triad Ser 
150. His 429 and Asp 372 is shown as light blue spheres. Two cysteines Cys 253 and Cys 303 referred to in the 
discussion are colored green. (This Figure generated using MOLSCRIPT (Kraulis (1991). infra)). 

Figure 10 is a close-up representation of the 'blocking* peptide (residues 272-277) bound in the active site. 

45 rendering the catalytic triad solvent inaccessible. Residues from the maturation subdomain are shown in orange, residues 
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from the helical domain in magenta and residues from the core domain in cyan. The excision peptide is shown in blue 
Side chains are shown for residues making extensive contacts with the blocking peptide or if mentioned in the text. The 
catalytic triad is shown in white. (Figure drawn with O (Jones (1991), infra)). 

Figure 1 1 is a representation of elements proposed to be involved in the activation mechanism of the precursor 
form of PPCA as discussed in the text The C-trace of the core domain is shown in cyan, the helical subdomain in red 
the maturation subdomain in orange, and the excision peptide is shown in blue. Relevant side chains are depicted and 
labeled. Rearrangement of the residues 254-302 limited by the disulfide Cys 253 and Cys 303 would free up the active 
site cleft. A charge cluster Arg 262, GIu 264. Arg 298 and Asp 300 occupies a strategic position within the maturation 
subdomau,. possibly involved in P H dependent regulation of conformational changes. The solvent accessible surface 
was calculated and visualized with the atomic coordinates by BIOGRAF (BIOGRAF Construct Users Guide Version 
3.2.1. , June 1993). 

Figure 12 is a schematic representation of the proposed activation of PPCA. The active site cleft is formed by 
the core domam (indicated as 'core' in the above scheme) and the helical subdomain (indicated as '«'). The maturation 
subdomatn (indicated as >nV) contains the residues that block the active she cleft rendering the precursor enzymatically 
macttve. shown in structure 1. In the acidic endosome/lysosome. the precursor undergoes activation. In activation 
pathway 2a, conformational rearrangements induced by low pH might render the excision peptide more accessible to 
proteases as a first step, followed by cleavage of the polypeptide chain removing the excision peptide. Alternatively 
in pathway 2b. proteolytic cleavage of the excision peptide might form the nigger for the total rearrangement, removing 
the blocking pepnde from the active site and thus generating the fully active enzyme as shown in structure 3. 

Figure 13 shows the ammo acid sequence of a human pPPCA. The underlined portion (residues 285-298) 
shows an excision peptide for conversion to die mature form, PPCA. 

Figure 14 shows the amino acid sequence of a human PPCA. 

Figure 15 shows a sequence alignment between pPPCA. CPW and CPY (top three sequences shown). Identical 
residues among all three sequences are boxed. Residue numbering is included for the pPPCA amino acid sequence 
The alignment was made using the GCG program PILEUP (GCG version 8), then manually adjusted using 3D-structural 
knowledge from the superposition of the CPW (Liao et al., l992)andCPY(Endrizzie/«i. 1994) atomic coordinates 
The ahgnment was later used to design a muhi-Ala search probe for molecular replacement calculations shown in the 
fourth sequence shown as -model'. The structure determination of pPPCA subsequently revealed that the protein can 
be divided in two domains: a 'core' domain (residues 1-182 and 303-452) and 'cap' domain (residues 183-302) The 
secondary structure elements for the PPCA precursor are depicted with shaded bars (for details on the assignment and 
nomenclature, see Rudenko et al. Structure 3:1249-1259 (1988) ). 

Figure 16 shows a schematic representation of a 'bootstrapping* cycle as described in Example 2. 
Figure 1 7 is a representation of an initial molecular mask enlarged to accommodate missing area's in the model 
The program MAMA (Kleywegt & Jones, 1994) was used to calculate the mask and mask editing options in O (Jones 
35 a al., 1 991 ) were used to extend the mask. 

Figure 18 is a representation of an enlargement of the model during the bootstrapping procedure plotted as a 
funcuon of the expansion step. The number of C atoms incorporated in the model per monomer is given (— ) as 
well as the number of correct side chains (-»-). Note that after the first round of building in the molecular replacement 
map (expansion step mr-). 37 residues from the molecular replacement search probes had to be deleted from the model 
reducing the number of C atoms to 294. Subsequent cycles allowed for the model to be expanded by small increments 
Figure 19 is a representation of a comparison of the C trace from a monomer core model (shown in magenta) 
and the complete PPCA monomer (shown in yellow). The core model contained only 294 C* atoms. The 452 residue 
PPCA monomer consists of a core domain and a cap domain. The helical subdomain and the maturation subdomain 
forming the cap domain have been shown in the figure above. 
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Figure 20A-D is a representation of the resolving power of the bootstrapping procedure showing three different 
stages in map quality. The atomic coordinates of the refined model are visualized with the electron density in Figures 
20B. 20C and 20D. Figures 20A and 20B show the initial 2m|F 0 J-D|F ealc | SigmaA weighted map calculated using 
phases from the molecular replacement solution. The electron density is essentially uninterpretable. Fig. 20C shows 
5 twofold averaged 2 IF^I-jFJ electron density map calculated using inverted phases from cycle bmc6. The density for 
P-strand Mp2 (residues 266-271) has become clearly visible. Fig. 20D shows unaveraged 2m|F obi |-D|F ak | SigmaA 
weighted map calculated using phases from the refined model. The quality of the density is very good. Density for the 
helix Mai (residues 287-293) which assumes a different conformation in the two monomers is now also apparent. 

Figure 21 shows a Ramachandran plot calculated for one monomer from a refined model of a pPPCA. Both 
1 0 monomers in the asymmetric unit give essentially equivalent plots. 

Figure 22 shows a schematic of a computer system for PPCA or pPPCA structure determination and/or rational 
drug design. 

Figure 23.1-52 lists the atomic coordinates for the active site of a pPPCA dimer having the amino acid 
sequence presented as portions of at least one of 50-76, 144-155, 173-197,226-253, 226-288, 294-310, 327-344, 338- 
15 350, 366-381 and 423-436 of (Figure 23.1-23.26) 452 amino acids (designated 1-452) of monomer 1, as well as 
corresponding portions of (Figure 23.26-23.52) 452 amino acids (designated 1001-1452) of monomer 2. 

Detailed Description of the Preferred Embodiments 
The present invention provides methods for expressing, purifying and crystallizing a protective 
protein/cathepsin A (PPCA) or a precursor protective protein/cathepsin A (pPPCA), where the crystals diffract x-rays 

20 with sufficiently high resolution to allow determination of the three-dimensional structure of the PPCA or pPPCA, or 
a portion or subdomain thereof. The three-dimensional structure (eg..as provided on computer readable media of the 
present invention) is useful for rational drug design of ligands of a PPCA or a pPPCA. Such ligands can be synthesized 
or recombinant^ produced and are useful as diagnostic agents or drugs for diagnosing, treating, inhibiting or preventing 
at least one PPCA- or pPPCA-related pathology. 

25 The determined structure is made using the PPCA or pPPCA amino acid sequences and/or atomic coordinate/x- 

ray diffraction data, which are analyzed to provide atomic model output data corresponding to the three-dimensional 
structure, e.g.. as provided on computer readable media. The computer analysis of the atomic coordinate/x-ray 
diffraction data and/or the amino acid sequence allows the calculation of the secondary, tertiary and/or quaternary 
structures; domains; and/or subdomains of the protein. These domains are combined and refined by additional 

30 calculations using suitable computer subroutines to determine the most probable or actual three-dimensional structure 
of the PPCA or pPPCA, including potential or actual active sites, binding sites or other structural or functional domains 
or subdomains of the protein. 

Structure determination methods are also provided by the present invention for rational drug design (RDD) of 
PPCA or pPPCA ligands. Such drug design uses computer modeling programs that calculate different molecules 

35 expected to interact with the determined active sites, binding sites, or other structural or functional domains or 
subdomains of a PPCA or a pPPCA. These ligands can then be produced and screened for activity in modulating or 
binding to a PPCA or pPPCA. according to methods and compositions of the present invention. 

The actual PPCA or pPPCA-ligand complexes can optionally be crystallized and analyzed using x-ray 
diffraction techniques. The diffraction patterns obtained are similarly used to calculate the three-dimensional interaction 

40 of the ligand and the PPCA or pPPCA, to confirm that the ligand binds to. or changes the conformation of, particular 
domain(s) or subdomain(s) of the PPCA or pPPCA. Such screening methods are selected from assays for at least one 
biological activity of a PPCA or a pPPCA. The resulting ligands. provided by methods of the present invention, 
modulate or bind at least one PPCA or pPPCA and are useful for diagnosing. Treating or preventing PPCA- or pPPCA- 
related pathologies in animals, such as humans. Ligands of a particular PPCA or pPPCA can similarly modulate other 

45 PPCAs or pPPCAs from other sources, such as other eukaryotes. 
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A PPCA or pPPCA is also provided as a crystallized protein suitable for x-ray diffraction analysis. The x-ray 
diffraction patterns obtained by the x-ray analysis are of moderate, to moderately high, to high resolution, e.g.. 30-10, 
10-3.5 or 1.5-3.5 A, respectively, with the higher resolutions included. These diffraction patterns are suitable and useful 
for three-dimensional structure determination of a PPCA or a pPPCA, domain or subdomain thereof. 
5 The determination of the three-dimensional structure of a PPCA or pPPCA has a broad-based utility. 

Significant sequence identity and conservation of important structural elements are expected to exist among different 
PPCAs or pPPCAs. Therefore, the three-dimensional structure from one or few PPCAs or pPPCAs can be used to 
identify ligands that have diagnostic or therapeutic value for at least one PPCA- or pPPCA-related pathology that may 
involve PPCAs or pPPCAs having different amino acid sequences. 

1 0 Determination of Protein Structures 

Different techniques give different and complementary information about protein structure. The primary 
structure is obtained by biochemical methods, either by direct determination of the amino acid sequence from the 
protein, or from the nucleotide sequence of the corresponding gene or cDNA. The quaternary structure of large proteins 
or aggregates can also be determined by electron microscopy. To obtain the secondary and tertiary structure, which 

1 5 requires detailed information about the arrangement of atoms within a protein, x-ray crystallography is preferred. See. 
e.g.. Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff, infra. 

The first prerequisite for solving the three-dimensional structure of a protein by x-ray crystallography is a well- 
ordered crystal that will diffract x-rays strongly. The crystallographic method directs a beam of x-rays onto a regular, 
repeating array of many identical molecules so that the x-rays are diffracted from it in a pattern from which the structure 

20 of an individual molecule can be retrieved. Well-ordered crystals of globular protein molecules are large, spherical, or 
ellipsoidal objects with irregular surfaces, and crystals thereof contain large holes or channels that are formed between 
the individual molecules. These channels, which usually occupy more than half the volume of the crystal, are filled with 
disordered solvent molecules. The protein molecules are in contact with each other at only a few small regions. This 
is one reason why structures of proteins determined by x-ray crystallography are generally the same as those for the 

25 proteins in solution. 

The formation of crystals is dependent on a number of different parameters, including pH, temperature, protein 
concentration, the nature of the solvent and precipitant, as well as the presence of added ions or ligands to the protein. 
Many routine crystallization experiments may be needed to screen all these parameters for the few combinations that 
might give crystal suitable for x-ray diffraction analysis. Crystallization robots can automate and speed up the work of 
30 reproducibly sening up large numbers of crystallization experiments. 

A pure and homogeneous protein sample is important for successful crystallization. Proteins obtained from 
cloned genes in efficient expression vectors can be purified quickly to homogeneity in large quantities in a few 
purification steps. A protein to be crystallized is preferably at least 93-99% pure according to standard criteria of 
homogeneity. Crystals form when molecules are precipitated very slowly from supersaturated solutions. The most 
35 frequently used procedure for making protein crystals is the hanging-drop method, in which a drop of protein solution 
is brought very gradually to supersaturation by loss of water from the droplet to the larger reservoir that contains salt 
or polyethylene glycol solution. 

Different crystal forms can be more or less well-ordered and hence give diffraction patterns of different quality. 
As a general rule, the more closely the protein molecules pack, and consequently the less water the crystals contain, the 
40 better is the diffraction pattern because the molecules are better ordered in the crystal. 

X-rays are electromagnetic radiation at short wavelengths, emitted when electrons jump from a higher to a 
lower energy state. In conventional sources in the laboratory, x-rays are produced by high-voltage tubes in which a 
metal plate, the anode, is bombarded with accelerating electrons and thereby caused to emit x-rays of a specific 
wavelength, so-called monochromatic x-rays. The high voltage rapidly heats up the metal plate, which therefore has 
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to be cooled. Efficient cooling is achieved by so-called rotating anode x-ray generators, where the metal plate revolves 
during the experiment so that different pans are heated up. 

More powerful x-ray beams can be produced in synchrotron storage rings where electrons (or positrons) travel 
close to the speed of light. These particles emit very strong radiation at all wavelengths from short gamma rays to visible 
5 light. When used as an x-ray source, only radiation within a window of suitable wavelengths is channeled from the 
storage ring. Polychromatic x-ray beams are produced by having a broad window that allows through x-ray radiation 
with wavelengths of 0.2 - 3.5 A. 

In diffraction experiments a narrow and parallel beam of x-rays is taken out from the x-ray source and directed 
onto the crystal to produce diffracted beams. The incident primary beam causes damage to both protein and solvent 
10 molecules. The crystal is, therefore, usually cooled to prolong its lifetime (e.g., -220 to -50°C). The primary beam must 
strike the crystal from many different directions to produce all possible diffraction spots, and so the crystal is rotated 
in the beam during the experiment. 

The diffracted spots are recorded either on a film, the classical method, or by an electronic detector. The 
exposed film has to be measured and digitized by a scanning device, whereas electronic detectors feed the signals they 
15 detect directly in a digitized form into a computer. Electronic area detectors (an electronic film) significantly reduce 
the time required to collect and measure diffraction data. 

When the primary beam from an x-ray source strikes the crystal, some of the x-rays interact with the electrons 
on each atom and cause them to oscillate. The oscillating electrons serve as a new source of x-rays, which are emitted 
in almost all directions, referred to as scattering. When atoms (and hence their electrons) are arranged in a regular three- 
20 dimensional array, as in a crystal, the x-rays emitted from the oscillating electrons interfere with one another. In most 
cases, these x-rays, colliding from different directions, cancel each other out; those from certain directions, however, 
will add together to produce diffracted beams of radiation that can be recorded as a pattern on a photographic plate or 
detector. 

The diffraction pattern obtained in an x-ray experiment is related to the crystal that caused the diffraction. X- 
25 rays that are reflected from adjacent planes travel different distances, and diffraction only occurs when the difference 
in distance is equal to the wavelength of the x-ray beam. This distance is dependent on the reflection angle, which is 
equal to the angle between the primary beam and the planes. 

The relationship between the reflection angle (0), the distance between the planes (d), and the wavelength (X) 
is given by Bragg's law; 2d sin 6 = k. This relation can be used to determine the size of the unit cell in the crystal. 
30 Briefly, the position on the film of the diffraction data relates each spot to a specific set of planes through the crystal. 
By using Bragg's law, these positions can be used to determine the size of the unit cell. 

Each atom in a crystal scatters x-rays in all directions, and only those that positively interfere with one another, 
according to Bragg's law, give rise to diffracted beams that can be recorded as a distinct diffraction spot above 
background. Each diffraction spot is the result of interference of all x-rays with the same diffraction angle emerging 
35 from all atoms. For example, for the protein crystal of myoglobin, each of the about 20.000 diffracted beams that have 
been measured contain scattered x-rays from each of the around 1500 atoms in the molecule. To extract information 
about individual atoms from such a system requires considerable computation. The mathematical tool that is used to 
handle such problems is called the Fourier transform. 

Each diffracted beam, which is recorded as a spot on the film, is defined by three properties: the amplitude. 
40 which we can measure from the intensity of the spot: the wavelength, which is set by the x-ray source: and the phase, 
which is lost in x-ray experiments. All three properties are needed for all of the diffracted beams, in order to determine 
the position of the atoms giving rise to the diffracted beams. 

For larger molecules, protein crystallographers have determined the phases in many cases using a method called 
multiple isomorphous replacement (MIR) (including heavy metal scattering), which requires the introduction of new 
45 x-ray scatterers into the unit cell of the crystal. These additions are usually heavy atoms (so that they make a significant 
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contribution to the diffraction pa«em). such that there should not be too many of them (so that their positions can be 
located): and they should not change the structure of the molecule or of the crystal cell, i.e., the crystals should be 
.somorohous. Isomorphous replacement is usually done by diffusing different heavy-metal complexes into the channels 
of the preformed protein crystals. The protein molecules expose side chains (such as SH groups) into these solvent 
channels that are able to bind heavy metals. It is also possible to replace endogenous light metals in metalloproteins with 
heavier ones, e.g., zinc by mercury, or calcium by samarium. 

Since such heavy metals contain many more electrons than the light atoms (H. N, C, O. and S) of the protein 
.hey scatter x-rays more strongly. All diffracted beams would therefore increase in intensity after heavy-metai 
subst.tut.on .fall interference were positive. In fact, however, some interference is negative; consequently, following 
heavy-metal substitution, some spots measurably increase in intensity, others decrease, and many show no detectable 
difference. 

Phase differences between diffracted spots can be determined from intensity changes following heavy-metal 
subst.tut.on. Fust, the intensity differences are used to deduce the positions of the heavy atoms in the crystal unit cell 
Founer summat.ons of these intensity differences give maps of the vectors between the heavy atoms, the so-called 
Patterson maps. From these vector maps the atomic arrangement of the heavy atoms is deduced. From the positions 
of the heavy metals in the unit cell, one can calculate the amplitudes and phases of their contribution to the diffracted 
beams of protein crystals containing heavy metals. 

This knowledge is then used to find the phase of the contribution from the protein in the absence of the heavy- 
metal atoms. As both the phase and amplitude of the heavy metals and the amplitude of the protein alone is known as 
well as the amplitude of the protein plus heavy metals (i.e., protein heavy-metal complex), one phase and three 
amplitudes are known. From this, the interference of the x-rays scattered by the heavy metals and protein can be 
calculated to see if it is constrictive or destructive. The extent of positive or negative interference, with knowledge of 
the phase of the heavy metal, give an estimate of the phase of the protein. Because two different phase angles are 
determined and are equally good solutions, a second heavy-metal complex can be used which also gives two possible 
phase angles. Only one of these will have the same value as one of the two previous phase angles; it therefore represents 
the correct phase angle. In practice, more than two different heavy-metal complexes are usually made in order to give 
a reasonably good phase determination for all reflections. Each individual phase estimate contains experimental errors 
arising from errors in the measured amplitudes. Funheimore, for many reflections, the intensity differences are too small 
to measure after one particular isomorphous replacement, and others can be tried. 

The amplitudes and the phases of the diffraction data from the protein crystals are used to calculate an electron- 
density map of the repeating unit of the crystal. This map then has to be interpreted as a polypeptide chain with a 
parocular amino acid sequence. The interpretation of the eleetron-density map is made more complex by several 
limitations of the data. First of all. the map itself contains errors, mainly due to errors in the phase angles. In addition 
the quality of the map depends on the resolution of the diffraction data, which in turn depends on how well-ordered the 
crystals are. This directly influences the image that can be produced. The resolution is measured in A units: the smaller 
this number is. the higher the resolution and therefore the greater the amount of detail that can be seen. 

Building the initial model is a trial-and-error process. First, one has to decide how the polypeptide chain 
weaves its way through the electron-density map. The resulting chain trace constitutes a hypothesis, by which one tries 
to match the density of the side chains to the known sequence of the polypeptide. When a reasonable chain trace has 
finally been obtained, an initial model is built to give the best fit of the atoms to the electron density. Computer graphics 
are used both for chain tracing and for model building to present the data and manipulated the models. 

The initial model will contain some errors. Provided the protein crystals diffract to high enough resolution (e g 
better than 3.5 A), most or substantially all of the errors can be removed by crystallographic refinement of the model 
usmg computer algorithms. In this process, the model is changed to minimize the difference between the experimental 
observed diffraction amplitudes and those calculated for a hypothetical crystal containing the model (instead of the real 
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molecule). This difference is expressed as an R factor (residual disagreement) which is 0.0 for exact agreement and 
about 0.59 for total disagreement. 

In general, the R factor is preferably between 0.15 and 0.35 (such as less than about 0.24-0.28) for a well- 
determined protein structure. The residual difference is a consequence of errors and imperfections in the data. These 
derive from various sources, including slight variations in the conformation of the protein molecules, as well as 
inaccurate coirections both for the presence of solvent and for differences in the orientation of the microcrystals from 
which the crystal is built. This means that the final model represents an average of molecules that are slightly different 
both in conformation and orientation. 

In refined structures at high resolution, there are usually no major errors in the orientation of individual 
residues, and the estimated eirors in atomic positions are usually around 0. 1-0.2 A, provided the amino acid sequence 
is known. Hydrogen bonds, both within the protein and to bound ligands. can be identified with a high degree of 
confidence. 

Most x-ray structures are determined to a resolution between 1.7 A and 3.5 A. Electron-density maps with this 
resolution range are preferably interpreted by fining the known amino acid sequences into regions of electron density 
15 in which individual atoms are not resolved. 

An amino acid sequence is preferred for accurate x-ray structure determination. Thus, recombinant DNA 
techniques have had a double impact on x-ray structural work. When a protein is cloned and overexpressed for structural 
studies, the amino acid sequence, necessary for the x-ray work, is also quickly obtained via the nucleotide sequence. 
Recombinant DNA techniques give us not only abundant supplies of rare proteins, but also their amino acid sequence 
20 as a bonus See. e.g.. Blundell. infra, Oxender. infra; McPherson. infra. Wyckoff. infra. 
Isolated PPCA andpPPCA Polypeptides 

A PPCA or pPPCA polypeptide can refer to any subset of a PPCA or pPPCA as a domain, subdomain, 
fragment, consensus sequence or repeating unit thereof. A PPCA or pPPCA polypeptide of the present invention can 
be prepared by, e.g.,: 
25 (a) recombinant DNA methods; 

(b) proteolytic digestion of the intact molecule or a domain, subdomain or fragment thereof; 

(c) chemical peptide synthesis methods well-known in the art; and/or 

(d) by any other method capable of producing a PPCA or pPPCA polypeptide and having a conformation 
similar to a structural or functional subdomain of a PPCA or a pPPCA. 

A biological activity of PPCA or pPPCA can be screened according to known screening assays. The minimum 
peptide sequence to have activity is based on the smallest unit containing or comprising a particular domain, subdomain, 
fragment, region, consensus sequence, or repeating unit thereof, having at least one biological activity of a PPCA or 
pPPCA. such as protecting activity, inhibiting activity or enzyme activity. Non-limiting examples of such activities are: 
protecting activity for p-galactosidase or neuraminidase (NA); modulating activity (inhibition, stimulation or activation) 
as an for endothelin I (serine carboxypeptidase) or cathepsin A and peptide hydrolvzing activity (e.g.. substance P and 
substance P-free acid; oxytocin and oxytocin-free acid; neurokinin A: angiotensin I; and bradykinin. 

According to the present invention, a PPCA or pPPCA includes an association of two or more polypeptide 
subdomains, such as at least one 4 amino acid portion of a core or cap domain of a PPCA or pPPCA. This can include 
1-14 subdomains of the cap domain and/or I -44 subdomains of the core domain (as monomers or dimers). or any range, 
value or combination thereof. Preferably 1-4 sets of each of at least one core or cap domains or subdomains are 
included. 

The structure of a monomer or domain of at least one PPCA includes at least one subdomain of a PPCA of a 
pPPCA of the present invention can include one or more of the following subdomains, as described herein. Generally 
a PPCA or pPPCA consists of a dimer of a core domain and a cap domain having the following subdomains having the 
45 specified residues, e.g.. as presented in Figure 13 (pPPCA) or Figure 14 (PPCA):: 
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Core domain subdomains: Cpi. 21-27; CP2, 32-39; Cp3, 50-54; Cal. 63-67; C|)4. 73-75; Cp5 82- 
84; Cp6. 94-98; Co2, 1 18-135; Cp7, 144-149; Ca3, 152-163; Cp8. 171-177; Co4. 307-313; Co5. 316-321 
Ca6, 336-34l;Ca7. 350-359; Cp9. 363-369: Ca8. 377-386; CplO. 391-401; Cpi 1, 407-416; CpI2 419-424- 
Ca9. 43 1-434; Co 10, 436-447; and 

Cap domain subdomains: Hoi, 183-196; Ho2. 202-212; Ho 3. 226-240; Mp 1, 261-264- Mp2 267- 
270; Mol. 290-293; MP3. 296-299. Note that for monomer 2 the secondary structure assignment in the cap 
domain are slightly different than in monomer I. 

A PPCA or pPPCA polypeptide of the invention can have at least 80% homology, such as 80-100% overall 
homology or identity, with one or more corresponding PPCA or pPPCA subdomains or fragments as described herein, 
such as a 4-542 amino acid fragment or portion of the amino acid sequence of Figures 13. 14 or 15. As would be 
understood by one of ordinary skill in the art, the above configurations of subdomains are provided as part of a PPCA 
or pPPCA polypeptide of the invention, when expressed in a suitable host cell, or otherwise svnthesized. to provide at 
least one structural or functional feature of a native PPCA or pPPCA. such as at least one PPCA-related biological 
act.v.ty. Such activities can be assayed using a suitable assay, to establish at least one PPCA biological activity of one 
or more PPCAs or pPPCAs of the invention. A PPCA or pPPCA polypeptide of the invention is not naturally occurring 
or .s naturally occurring but is in a purified or isolated form which does not occur in nature. Examples of suitable PPCA 
activity assay include, e.g.. cathepsin A activity (Galjart el «/.. 1 Biol. Chem. 266:14754-14762 (1991)- Endothelin I 
deamidase activity (Jackman. el at.. J. Biol. Chem. 267:2872-2875(1992); and tachykinin deamidase activity (Jackman . 
etal.. J. Biol. Chem. 265:) 1265-1 1272 (1990)). 

Percent homology or identity can be determined, for example, by comparing sequence information using the 
GAP computer program, version 6.0. available from the University of Wisconsin Genetics Computer Group (UWGCG). 
The GAP program utilizes the alignment method of Needleman and Wunsch (J. Mol. Biol. 48:443 (1970) as revised 
by Smith and Waterman (Adv. Appl. Math. 2:482 (1981). Briefly, the GAP program defines similarity as 'the number 
of aligned symbols (i.e.. nucleotides or amino acids) which are similar, divided by the total number of symbols in the 
shorter of the two sequences. The preferred default parameters for the GAP program include: (1) a unitary comparison 
matnx (containing a value of I for identities and 0 for non-identities) and the weighted comparison matrix of Gribskov 
and Burgess. Nucl. Acids Res. 14:6745 (1986). as described by Schwartz and Dayhoff, eds.. ATLAS OF PROTEIN 
SEQUENCE AND STRUCTURE, National Biomedical Research Foundation, pp. 353-358 (1979); (2) a penalty of 3 0 
for each gap and an additional 0.10 penalty for each symbol in each gap; and (3) no penalty for end gaps. 

Thus, one of ordinary skill in the art, given the teachings and guidance presented in the present specification, 
will know how to add. delete or substitute other amino acid residues in other positions of a PPCA or pPPCA to obtain 
substituted, deletional or additional variants thereof. 

Non-limiting examples of substitutions of a PPCA or pPPCA domains or polypeptide of the invention are those 
in which at least one amino acid residue in the protein molecule has been removed and a different residue added in its 
place according to the following Table 2. The types of substitutions which can be made in the protein or peptide 
molecule of the invention can be based on analysis of the frequencies of amino acid changes between a homologous 
protein of different species, such those presented in Figure 1 5. Based on such an analysis, alternative substitutions are 
defined herein as exchanges within one of the following five groups: 

1 Small aliphatic, nonpolar or slightly polar residues: Ala. Sei. Thr (Pro. Gty); 

2 Polar, negatively charged residues and their amides: Asp. Asn. Glu. Gin; 

3 Polar, positively charged residues: 
His. Arg. Lys: 

4 Large aliphatic, nonpolar residues: 
M«. Leu. He. Val (Cys); and 

^ 5 La»ge aromatic residues: Phe. Tyr. Trp. 

Most deletions and additions, and substitutions according to the invention are those which do not produce 
radical changes in the characteristics of the protein or peptide molecule. "Characteristics" is defined in a non-inclusive 
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manner to define both changes in secondary structure, e.g. et-helix or P-sheet, as well as changes in physiological 
activity, e.g.. in biological activity assays. However, when the exact effect of the substitution, deletion, or addition is 
to be confirmed, one skilled in the art will appreciate that the effect of at least one substitution, addition or deletion will 
be evaluated by at least one PPCA or pPPCA screening assay, such as. but not limited to, immunoassays or bioassays, 
to confirm at least one PPCA or pPPCA biological activity. 

Surprisingly, a PPCA and/or a pPPCA is now discovered to have serine carboxypeptidase activity and 
corresponding structural features, although having only about 30% sequence identity to wheat and yeast serine 
carboxypeptidases. These carboxypeptidases are members of the hydrolase fold family (Liao et al.. Biochemistry 
J/:9796-9812(!992); Endrizzi et al. Biochemistry 33: 1 1 106-11120(1994); Ollise/a/.. Protein Eng. 5:197-21 1 (1992)). 
The serine carboxypeptidases have peptidase activity at acidic pH ( pH 4.5-5.5) as well as deamidase and esterase 
activities at pH 7 (reviewed in Breddam et at. Carlsberg Res. Commun. 5/:83-l28 (1986); Rawlings & Barrett. Methods 
inEmymology. 24*19-61 (1994)). Mutagenesis studies and enzymatic assays have revealed that only the mature form 
of PPCA possesses a serine carboxypeptidase activity, which is similar to that of lysosomal cathepsin A. and has a 
preference for hydrophobic substrates such as the dipeptide Phe-Ala (Galjart et al.. J. Biol Chem. 266:14754-14762 
15 (1991)). On the basis of sequence alignments with members of the serine carboxypeptidase family, mutagenesis studies 
and the structure determination of pPPCA. the catalytic triad in PPCA has now been determined to be formed by the 
residues Ser 1 50. His 429 and Asp 372 
PPCA and pPPCA Expression for Isolation and Purification 

A nucleic acid sequence encoding a PPCA or a pPPCA (Galjart et aL. Cell. 54:755-764 (1988)) can be 
20 recombined with vector DNA in accordance with conventional techniques, including blunt-ended or staggered-ended 
termini for ligation, restriction enzyme digestion to provide appropriate termini, filling in of cohesive ends as 
appropriate, alkaline phosphatase treatment to avoid undesirable joining, and ligation with appropriate ligases. 
Techniques for such manipulations are disclosed, e g., in Sambrook et al.. Molecular Cloning: A Laboratory Manual 
Second edition. Cold Spring Harbor Laboratory, Cold Spring Harbor. NY (1989); and Ausubel et al.,Current Protocols 
25 m Molecular Biology. Wiley Interscience, N.Y.. ( 1 988- 1 995) and are well known in the art. 

A nucleic acid molecule, such as DNA, is said to be "capable of expressing" a polypeptide if it contains 
nucleotide sequences which contain transcriptional and translational regulatory information and such sequences are 
"operably linked" to nucleotide sequences which encode the polypeptide. An operable linkage is a linkage in which the 
regulatory DNA sequences and the DNA sequence sought to be expressed are connected in such a way as to permit gene 
30 expression as a PPCA . pPPCA or fragment thereof, in recoverable amounts. The precise nature of the regulatory 
regions needed for gene expression can vary from organism to organism, as is well known in the analogous art See, 
e.g. , Sambrook. infra and Ausubel, infra. 

The invention accordingly encompasses the expression of a PPCA or a pPPCA, in either prokaryotic or 
eukaryotic cells, although eukaryotic expression is preferred. Preferred hosts are bacterial or eukaryotic hosts including 
35 bacteria, yeast, insects, fungi, bird and mammalian cells either in vivo, or in situ, or host cells of mammalian, insect, bird 
or yeast origin. It is preferred that the mammalian cell or tissue is of human, primate, hamster, rabbit, rodent, cow, pig. 
sheep, horse, goat, dog or cat origin, but any other mammalian cell can be used. 

Eukaryotic hosts can include yeast, insects, fungi, and mammalian cells either in vivo, or in tissue culture. 
Preferred eukaryotic hosts can also include, but are not limited to insect cells, mammalian cells either in vivo, or in tissue 
culture. Preferred mammalian cells include Xenopus oocytes. HeLa cells, cells of fibroblast origin such as VERO or 
CHO-K 1 . or cells of lymphoid origin and their derivatives. 

Mammalian cells provide post-translational modifications to protein molecules including correct folding or 
glycosylation at correct sites. Mammalian cells which can be useful as hosts include cells of fibroblast origin such as 
but not limited to. NIH 3T3. VERO or CHO. or cells of lymphoid origin, such as. but not limited to. the hybridoma 
45 SP2/0-Agl4 or the murine myeloma P3-X63Ag8. hamster cell lines (e.g.. CHO-K I and progenitors, e.g.. CHO- 
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DUXB1 1 ) and their derivatives. One preferred type of mammalian cells are cells which are intended to replace the 
function of the genetically deficient cells in vivo. Neuronally derived cells are preferred for gene therapy of disorders 
of the nervous system. For a mammalian cell host, many possible vector systems are available for the expression of 
at least one PPCA or pPPCA. A wide variety of transcriptional and translational regulatory sequences can be employed, 
depending upon the nature of the host. The transcriptional and translational regulatory signals can be derived from viral 
sources, such as, but not limited to, adenovirus, bovine papilloma virus. Simian virus, or the like, where the regulatory 
signals are associated with a particular gene which has a high level of expression. Alternatively, promoters from 
mammalian expression products, such as, but not limited to, actin, collagen, myosin, protein production. 

When live insects are to be used, silk moth caterpillars and baculoviral vectors are presently preferred hosts 
for large scale PPCA or pPPCA production according to the invention. Production of PPCA or pPPCA in insects can 
be achieved, for example, by infecting the insect host with a baculovirus engineered to express transmembrane 
polypeptide by methods known to those skilled in the related arts. See Ausubel infra, §§16.8-16,1 1. 

In a preferred embodiment, the introduced nucleotide sequence will be incorporated into a plasmid or viral 
vector capable of autonomous replication in the recipient host. Any of a wide variety of vectors can be employed for 
15 thispurpose. See,e.g., AusubeU/o/., infra,§§ 1.5, 1.10,7.1,73,8.1,9.6,9.7, 13.4, 16.2, 16.6,and 16.8-16.11. Factors 
of importance in selecting a particular plasmid or viral vector include: the ease with which recipient cells that contain 
the vector can be recognized and selected from those recipient cells which do not contain the vector; the number of 
copies of the vector which are desired in a particular host; and whether it is desirable to be able to "shuttle" the vector 
between host cells of different species. 
20 Different host cells have characteristic and specific mechanisms for the translational and post-translational 

processing and modification (e.g., glycosylation, cleavage) of proteins. Appropriate cell lines or host systems can be 
chosen to ensure the desired modification and processing of the foreign protein expressed. For example, expression in 
a bacterial system can be used to produce an unglycosylated core protein product. Expression in yeast will produce a 
glycosylated product Expression in mammalian cells can be used to ensure "native" glycosylation of the heterologous 
25 PPCA or pPPCA. Furthermore, different vector/host expression systems can effect processing reactions such as 
proteolytic cleavages to different extents. 

As discussed above, expression of PPCA or pPPCA in eukaryotic hosts requires the use of eukaryotic regulatory 
regions. Such regions will, in general, include a promoter region sufficient to direct the initiation of RN A synthesis. 
See. e.g., Ausubel, infra, Sambrook, infra. 
30 Once the vector or nucleic acid molecule containing the constructs) has been prepared for expression, the DNA 

construct(s) can be introduced into an appropriate host cell by any of a variety of suitable means, i.e., transformation, 
transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate-precipitation, 
direct microinjection, and the like. After the introduction of the vector, recipient cells are grown in a selective medium, 
which selects for the growth of vector-containing cells. Expression of the cloned gene molecule(s) results in the 
35 production of a PPCA or pPPCA. This can take place in the transformed cells as such, or following the induction of 
these cells to differentiate (for example, by administration of bromodeoxyuracil to neuroblastoma cells or the like). 

A PPCA or pPPCA, or fragments thereof, of this invention can be obtained by expression from recombinant 
DNA according to known methods. Alternatively, a PPCA or pPPCA can be purified from biological material. A PPCA 
or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse liver, pia kidney, 
40 bovine testes, bovine liver, and the like) of various genus and species. 

The PPCA or pPPCA can be isolated and purified in accordance with conventional method steps, such as 
extraction, precipitation, chromatography, affinity chromatography, electrophoresis, or the like. For example, cells 
expressing at least one PPCA or pPPCA in suitable levels can be collected by centrifugation, or with suitable buffers, 
lysed. and the protein isolated by column chromatography, for example, on DEAE-cellulose. phosphocellulose. 
45 polyribocytidylic acid-agarose. hydroxyapatite or by electrophoresis or immunoprecipitation. Alternatively, a pPPCA 
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or PPCA can be isolated by the use of antibodies, such as, but not limited to, a PPCA- or pPPCA-specific antibody. Such 
antibodies can be obtained by known method steps (see, e.g.. Harlow and Lane ANTIBODIES: A LABORATORY 
MANUAL Cold Spring Harbor Laboratory (1988): Colligan ei ai, eds.. Current Protocols in Immunology. Greene 
Publishing Assoc. and Wiley Interscience, N.Y., (1992, 1993). the contents of which references are entirely incorporated 
5 herein by reference). 

A PPCA or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse 
liver, pig kidney, bovine testes, bovine liver, and the like) of various genus and species, using known techniques such 
as gel filtration, phase separation and affinity chromatography, eg.using polyclonal or monoclonal antibodies specific 
for a PPCA or pPPCA, according to known methods. See., e.g. Oxender et a!.. Protein Engineering Liss New York 
10 (1986). 

Overview of PPCA orpPPCA Purification and Crystallization Methods 

In general, a PPCA or pPPCA is isolated in soluble form in sufficient purity and concentration (e.g. a monomer 
or dimer) for crystallization. The PPCA or pPPCA is then isolated and assayed for biological activity (eg., cathepsin 
A) and for lack of aggregation (which interferes with crystallization). The purified PPCA or pPPCA preferably runs 
as a single band for each monomer under reducing or nonreducing polyacrylamide gel electrophoresis (PAGE) 
(nonreducing is used to evaluate the presence of cysteine bridges). 

The purified PPCA or pPPCA is preferably crystallized under varying conditions of at least one of the 
following: pH, buffer type, buffer concentration, salt type, polymer type, polymer concentration, other precipitating 
ligands and concentration of purified PPCA or pPPCA. See, e.g., known methods (Blundell et ai. Protein 
CrystaUograpty, Academic Press, London (1976); Oxender, infra. McPherson, The Preparation and Analysis of Protein 
Crystals, Wiley Interscience, N Y. (1982)) or methods provided in a commercial kit, such as CRYSTAL SCREEN 
(Hampton Research. Riverside, CA). The crystallized PPCA protein can optionally be tested for at least one PPCA 
activity and differently sized and shaped crystals are further tested for suitability for x-ray diffraction. Generally, larger 
crystals provide better crystallographic data than smaller crystals, and thicker crystals provide better crystal lo^raphic 
25 data than thinner crystals. See. e.g. Blundell. infra; Oxender, infra; McPherson, infra, Wyckoff et ai. Diffraction 
Methods for Biological MacromoleculesVok. 1 14-1 15, Methods in Emymology, Academic Press. Orlando, FL (1985). 
Protein Crystallization Methods 

The hanging drop method is preferably used to crystallize the purified protein. See. e.g., Blundell, infra. 
Oxender, infra, McPherson, infra: Wyckoff, infra; Taylor et ai.J. Mol. Biol. 225:1287-1290 (1992); Takimoto et ai 
30 (1992), infra; CRYSTAL SCREEN, Hampton Research. 

A mixture of the purified protein and precipitant can include the following: 

• pH(*.g.,7-9); 

• buffer type (e.g. tromethamine (TRIZMAX sodium azide (NaN,X phosphate, sodium, or cacodylate 
acetates, imidazole, Tris HCI, sodium hepes); 

35 • buffer concentration (eg., 1-100 mM); 

• salt type (e.g. sodium azide, calcium chloride, sodium citrate, magnesium chloride, ammonium 
acetate, ammonium sulfate, potassium phosphate, magnesium acetate, zinc acetate: calcium acetate) 

• polymer type and concentration: (e.g. polyethylene glycol (PEG) I -50%, type 400- 1 0,000): 

• other additives (salts: potassium, sodium, tartrate, ammonium sulfate, sodium acetate, lithium sulfate, 
sodium formate, sodium citrate, magnesium formate, sodium phosphate, potassium phosphate: 
organics: 2-propanol: non-volatile: 2-methyl-2,4-pentanediol); p-octyl glucoside and 

• concentration of purified PPCA or pPPCA (e.g.. 1.0-100 mg/ml). 
See. e.g. CRYSTAL SCREEN. Hampton Research. 

A non-limiting example of such crystallization conditions is the following: 
45 • purified PPCA or pPPCA protein (e.g. 5 mg/ml): 
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• (2) solutions in serial mixtures 

(1) 40-80 mM TRIZMA. 0.05-2.0 mM NaN„; 

(2) 2-30% Polyethylene glycol (PEG) 8000 buffered with 40-80 mM TRIZMA and 
0.05-2.0 mM NaN 3 

• 0.05-0.5% P-octyi glucoside; 

• at an overall pH of about 8.0-8.3. 

The above mixtures arc used and screened by varying at least one of pH, buffer type, buffer concentration, 
precipitating salt type or additive or their concentrations, PEG type, PEG concentration, and protein concentration! 
Crystals ranging in size from 0.1-0.9 mm are formed in I -14 days. These crystals diffract x-rays to at least 10 A 
resolution, such as 0.1 5- 10.0 A, or any range of value therein, such as 1.5, 1.6. 1.7. 1.8, 1.9,2.0.2.1,2.2,2 3 24 2 5 
2.6, 2.7, 2.8, 2.9, 3.0, 3. 1 , 32, 3.3, 3.4 or 3.5, with 3.5 A or higher being preferred for the highest resolution. In addition 
to diffraction patterns having this highest resolution, lower resolution, such as 25-3.5 A can also be used. See, e.g., 
Blundell, infra: Oxender, infra; McPherson, infra; Wyckoff, infra; 
Protein Ctystab 

Crystals appear after 1-14 days and continue to grow on subsequent days. Some of the crystals can be 
optionally removed, washed, and assayed for biological activity (e.g.. PPCA), which activity is preferred for using in 
further characterizations. Other washed crystals are preferably run on a gel and stained, and those that migrate in the 
same position as the purified PPCA or pPPCA are preferably used. From two to one hundred crystals are observed in 
one drop and crystal forms can occur, such as, but not limited to, orthorombic, bipyramidal, rhomboid, and cubic. Initial 
x-ray analyses indicate that such crystals diffract at moderately high to high resolution. When fewer crystals are 
produced in a drop, they can be much larger size, eg, 0.4-0.9 mm. See. eg., Blundell, infra; Oxender, infra; 
McPherson, infra; Wyckoff, infra; 
PPCA andpPPCA X-ray Crystallography Methods 

The crystals so produced for a PPCA or pPPCA are x-ray analyzed using a suitable x-ray source. Diffraction 
patterns are obtained. Crystals are preferably stable for at least 10 hrs in the x-ray beam . Frozen crystals (e.g., -220 
to -50°C) are optionally used for longer x-ray exposures (eg., 5-72 hrs), the crystals being relatively more stable to the 
x-rays in the frozen state. To collect the maximum number of useful reflections, multiple frames are optionally collected 
as the crystal is rotated in the x-ray beam, e.g., for 5-72 hrs. Larger crystals (>0.2 mm) are preferred, to increase the 
resolution of the x-ray diffraction patterns obtained. Crystals are preferably analyzed using a synchrotron high energy 
30 x-ray source. Using frozen crystals, x-ray diffraction data is collected on crystals that diffract to at least a relatively high 
resolution of 10-1.5 A, with lower resolutions also useful, such as 25-10A. sufficient to solve the three-dimensional 
structure of a PPCA or pPPCA in considerable detail, as presented herein. 

Passing an x-ray beam through a crystal produces a diffraction pattern as a result of the x-rays interacting and 
being scattered by the contents of the crystal. The diffraction pattern can be visualized using, e.g.. an image plate or 
film, resulting in an image with spots corresponding to the diffracted x-rays. The positions of the spots m the diffraction 
pattern are used to determine parameters intrinsic to the crystal (such as unicell parameters) and to gain information on 
the packing of the molecules in the crystal. The intensity of the spots contains the Fourier transformation of the 
molecules in the crystal, i.e., information on each atom in the crystal and hence of the crystallized molecule. 

After data collection of diffraction patterns, the data is processed. This includes measuring the spots on each 
40 diffraction pattern in terms of position and intensity. This information is processed (i.e.. mathematical operations are 
performed on the data (such as scaling, merging and convening the data from intensity of diffracted beams to 
amplitudes)) to yield a set of data which is in a form as can be used for the further structure determination of the 
molecule crystallized. The amplitudes of the diffracted x-rays are then combined with calculated phases to produce an 
electron density map of the contents of the crystal. In this electron density map. the structure of the molecules (as 
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present in the crystal) is built. The phases can be determined with various known techniques, one being molecular 
replacement. 

For the molecular replacement technique one takes a known three dimensional structure thought to share 
structural homology with the structure to be determined, to generate after calculations a first set of initial phases. These 
phases are then combined with the diffraction information of the molecule for which you want to solve the structure of. 
Tlie result is an electron density map of the molecules in the crystal from which the diffraction patterns originate. 

The phases can be further optimized using a technique called density modification, which allows electron 
density maps of better quality to be produced facilitating interpretation and model building therein. The atomic model 
is then refined by allowing the atoms in the model to move in order to match the diffraction data as well as possible 
while continuing to satisfy stereochemical constraints (sensible bond lengths! bond angles and the like). See. e.g., 
Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff, infra; 
Computer Related Embodiments 

An amino acid sequence of a PPCA or pPPCA and/or atomic coordinate/x-ray diffraction data, useful for 
computer structure determination of a PPCA, pPPCA or a portion thereof, can be "provided" in a variety of mediums 
to facilitate use thereof. As used herein, provided refers to a manufacture, which contains a PPCA or pPPCA amino acid 
sequence and/or atomic coordinate/x-ray diffraction data of the present invention, e.g., the amino sequence provided 
in Figures 13-15, a representative fragment thereof, or an amino acid sequence having at least 8f>100% overall identity 
to a 5-542 amino acid fragment of an amino acid sequence of Figures 13-15. Such a method provides the amino acid 
sequence and/or atomic coordinate/x-ray diffraction data in a form which allows a skilled artisan to analyze and 
determine the three- dimensional structure of a PPCA, a pPPCA or a subdomain thereof. 

In one application of this embodiment, PPCA, pPPCA, or at least one subdomain thereof, amino acid sequence 
and/or atomic coordinate/x-ray diffraction data of the present invention is recorded on computer readable media. As 
used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer. 
Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and 
magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any 
of the presently known computer readable media can be used to create a manufacture comprising computer readable 
medium having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention. 

As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled 
artisan can readily adopt any of the presently known methods for recording information on computer readable medium 
to generate manufactures comprising an amino acid sequence and/or atomic coordinate/x-ray diffraction data information 
of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium 
having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention. 
The choice of the data storage structure will generally be based on the means chosen to access the stored information. 
In addition, a variety of data processor programs and formats can be used to store the sequence and x-ray data 
information of the present invention on computer readable medium. The sequence information can be represented in 
a word processing text file, formatted in commercially-available software such as WordPerfect and MICROSOFT Word, 
or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like. 
A skilled artisan can readily adapt any number of dataprocessor structuring formats (e.g. text file or database) in order 
to obtain computer readable medium having recorded thereon the information of the present invention. 

By providing on computer readable media having stored therein a PPCA or pPPCA sequence and/or atomic 
coordinates based on x-ray diffraction data, a skilled artisan can routinely access the sequence and atomic coordinate 
or x-ray diffraction data to model a PPCA, pPPCA. a subdomain thereof, or a ligand thereof. Computer algorithms are 
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publicly and commercially available which allow a skilled anisan to access this data provided on a computer readable 
medium and analyze it for structure determination and/or RDD, See, eg.. Biotechnology Software Directory. Mary Ann 
Liebert Publ., New York (1995). 

The present invention further provides systems, particularly computer-based systems, which contain the 
5 sequence and/or diffraction data described herein. Such systems are designed to do structure determination and RDD 
for a PPCA, pPPCA or at least one subdomain thereof. Non-limiting examples are microcomputer workstations 
available from Silicon Graphics Incorporated and Sun Microsystems running Unix based. Windows NT or IBM OS/2 
operating systems. 

As used herein, "a computer-based system" refers to the hardware means, software means, and data storage 
10 means used to analyze the sequence and/or atomic coordinate/x-ray diffraction data of the present invention. The 
minimum hardware means of the computer-based systems of the present invention comprises a central processing unit 
(CPU), input means, output means, and data storage means. A skilled artisan can readily appreciate which of the 
currently available computer-based system are suitable for use in the present invention. A monitor is optionally provided 
to visualize structure data. 

15 As stated above, the computer-based systems of the present invention comprise a data storage means having 

stored therein a PPCA, pPPCA or fragment sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention and the necessary hardware means and software means for supporting and implementing an analysis means. 
As used herein, "data storage means" refers to memory which can store sequence or atomic coordinate/x-ray diffraction 
data of the present invention, or a memory access means which can access manufactures having recorded thereon the 

20 sequence or x-ray data of the present invention. 

As used herein, "search means" or "analysis means" refers to one or more programs which are implemented 
on the computer-based system to compare a target sequence or target structural motif with the sequence or x-ray data 
stored within the data storage means. Search means are used to identify fragments or regions of a PPCA or pPPCA 
which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a 

25 variety of commercially available software for conducting search means are and can be used in the computer-based 
systems of the present invention. A skilled artisan can readily recognize that any one of the available algorithms or 
implementing software packages for conducting computer analyses that can be adapted for use in the present computer- 
based systems. 

As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or 
30 combination of sequences in which the sequences) are chosen based on a three-dimensional configuration or electron 
density map which is formed upon the folding of the target motif. There are a variety of target motifs known in the art. 
Protein target motifs include, but are not limited to, enzymic active sites, structural subdomains, epitopes, functional 
domains and signal sequences. A variety of structural formats for the input and output means can be used to input and 
output the information in the computer-based systems of the present invention. 

A variety of comparing means can be used to compare a target sequence or target motif with the data storage 
means to identify structural motifs or interpret electron density maps derived in pan from the atomic coordinate/x-ray 
diffraction data. A skilled artisan can readily recognize that any one of the publicly available computer modeling 
programs can be used as the search means for the computer-based systems of the present invention. 

One application of this embodiment is provided in Figure 22. Figure 22 provides a block diagram of a 
computer system 102 that can be used to implement the present invention. The computer system 102 includes a 
processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented 
as random access memory. RAM) and a variety of secondary storage memory 1 10, such as a hard drive 1 12. a removable 
storage medium 1 14. and a monitor 120. The removable medium storage device 1 14 may represent, for example, a 
floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 1 1 6 (such as a floppy 
disk, a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted into 
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the removable medium storage medium 1 14. The computer system 102 includes appropriate software for reading the 
control logic and/or the data from the removable medium storage device 1 14 once insetted in the removable medium 
storage device 114. 

Amino acid, encoding nucleotide or other sequence and/or atomic coordinate/x-ray diffraction data of the 
present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 
1 10. and/or a removable storage device 1 16. Software for accessing and processing the amino acid sequence and/or 
atomic coordinate/x-ray diffraction data (such as search tools, comparing tools, etc.) reside in main memory 108 during 
execution. The monitor 1 20 is optionally used to visualize the structure data. 
Structure Determination 

One or more computational steps, computer programs and/or computer algorithms are used to build a molecular 
3-D model of a PPCA or pPPCA. using amino acid sequence data from Figures 13-15 (or variants thereof) and/or atomic 
coordinate/x-ray diffraction data, as presented herein. 

In x-ray crystallography, x-ray diffraction data and phases are combined to produce electron density maps in 
which the three-dimensional structure of a PPCA or pPPCA is then built or modeled. This structure can then be used 
for RDD of modulators of at least one PPCA- or pPPCA-related activity that is relevant to at least one PPCA- or 
pPPCA-related pathology. 

Density Modification and Map Interpretation. Electron density maps can be calculated using such programs 
as those from the CCP4 computing package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory. 
UK, 1979). Cycles of two-fold averaging can further be used, such as with the program RAVE (Kleywegt & Jones 
Bailey e, al.. eds.. First Map to Final Model, SERC Daresbury Laboratory, UK, pp 59-66 (1994)) and gradual model 
expansion. For map visualization and model building a program such as "O" (Jones ( 1 991 ), infra) can be used. 

Refinement and Model Validation. Rigid body and positional refinement can be carried out using a program 
such as X-PLOR (Brtnger (1992). infra), e.g., with the stereochemical parameters of Engh and Huber (Acta Cryst. 
^7:392-400 (1991)). If the model at this stage in the averaged maps still misses residues (e.g.. at least 5-10 per 
subunit), the some or all of the missing residues can be incorporated in the model during additional cycles of positional 
refinement and model building. The refinement procedure can start using data from lower resolution (e g 25-10A to 
10-3.0 A and then gradually extended to include data from 12-6A to 3.0-1.5 A. B-values (also termed temperature 
factors) forindividual atoms can be refined once data of2.8A or higherfe.*. up to I.S A) has been added. Subsequently 
waters can be gradually added. A program such as ARP (Lamzin and Wilson, Acta Cryst. D49: 129- 1 47 (1993)) can be 
used to add crystallographic waters and as a tool to check for bad areas in the model. Programs such as PROCHECK 
(Lackowski et at.. J. Appl. Cryst. 2*283-291 (1993)). WHATIF (Vriend. J. Mot. Graph 5:52-56 (1990)) and PROFILE 
3D (LUthy a at.. Nature 356:83-85 (1992)), as well as the geometrical analysis generated by X-PLOR can be been used 
to check the structure for em>rs. A program such as DSSP can be used to assign the secondary structure elements 
(Kabsch and Sander (1983). infra). 

The structure of a PPCA or pPPCA can thus be solved with the molecular replacement procedure such as by 
using X-PLOR (Brttnger (1992). infra). A partial search model for the monomer can be constructed using a related 
protein, such as wheat serine carboxypeptidase sttucture (Liao et al. (1992). infra). The rotation and translation function 
can be solved to yield orientations and positions for the subunits in the crystallographic asymmetric unit. This allows 
phases to be determined that, when combined with information from the x-ray diffraction panems. allows electron 
density maps of a PPCA or pPPCA to be calculated. The atomic model is then built using these electron density maps. 
Cyclical two-fold density averaging can also be done to improve the electron density maps using a suitable program 
(eg.. RAVE) and model expansion can also be used to add missing residues for each monomer, resulting in a model with 
95-99.9% of the total number residues. The model can be refined in a program such as X-PLOR (Briinger (1992) 
supra). io a suitable crystallographic 1W The model data is then saved on computer readable media for use in further 
45 analysis, such as rational drug design. 
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Hat tonal Design of Drugs tit at Interact with the PPCA or pPPCA 

The determination of the three-dimensional structure of a PPCA or pPPCA, as described herein, provides a 
basis for the design of new and specific ligands for the diagnosis and/or treatment of at least one PPCA- or pPPCA- 
related pathology. 

5 Several approaches can be taken for the use of the crystal structure of a PPCA or pPPCA in the rational design 

of ligands of this protein. A computer-assisted, manual examination of the active site structure is optionally done. The 
use of software such as GRID ( Goodford, J. Med Chem. 2*849-857 (1985)) a program that determines probable 
interaction sites between probes with various functional group characteristics and the enzyme surface — is used to 
analyze the active site to determine structures of inhibiting compounds. The program calculations, with suitable 

10 inhibiting groups on molecules (e.g., protonated primary amines) as the probe, are used to identify potential hotspots 
around accessible positions at suitable energy contour levels. Suitable ligands, as inhibiting or stimulating modulating 
compounds or compositions, are then tested for modulating activities of at least one PPCA or pPPCA 

A diagnostic or therapeutic PPCA or pPPCA modulating ligand of the present invention can be, but is not 
limited to, at least one selected from a nucleic acid, a compound, a protein, an element, a lipid, an antibody, a saccharide. 

15 an isotope, a carbohydrate, an imaging agent, a lipoprotein, a grycoprotetn, an enzyme, a detectable probe, and antibody 
or fragment thereof, or any combination thereof, which can be detectably labeled as for labeling antibodies. Such labels 
include, but are not limited to. enzymatic labels, radioisotope or radioactive compounds or elements, fluorescent 
compounds or metals, chemiluminescent compounds and bioluminescent compounds. Alternatively, any other known 
diagnostic or therapeutic agent can be used in a method of the invention. 

20 After preliminary experiments are done to determine the K m of the substrate with each enzyme activity of a 

PPCA or pPPCA. the time-dependent nature of modulation of ligand K, values are determined, (e.g., by the method of 
Henderson (Biochem. J. 727:321-333 (1972)). For example, the substrate (or blank where appropriate) and enzyme 
are pre-incubated in buffer. Reactions are initiated by the addition of substrate. Aliquots are removed over a suitable 
time course and each quenched by addition into the aliquots of suitable quenching solution (e.g., sodium hydroxide in 

25 aqueous ethanol). The concentration of product is determined, e.g., fluorometrically, using a spectrometer . Plots of 
fluorescence against time can be close to linear over the assay period, and are used to obtain values for the initial velocity 
in the presence (V<) or absence (V 0 ) of ligand. Error is present in both axes in a Henderson plot, making it inappropriate 
for standard regression analysis (Leatherbarrow, Trends Biochem. Set. 75:455-458 (1990)). Therefore, K ; values are 
obtained from the data by fitting to a modified version of the Henderson equation for competitive inhibition: 
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Qr 2 + (£ - Q - /)r - £ = 0 
where (using the notation of Henderson (Biochem. J. 727:321-333 (1972)): 



( A f K A y 

Q = KA ' * ' and r = 

1 * ' v, 



This equation is solved for the positive root with the constraint that 

Q = K,((A t +K,)/KJ 

35 using PROCNLIN from SAS (SAS Institute Inc.. Cary. North Carolina, USA) which performs nonlinear regression using 
least-square techniques. The iterative method used is optionally the multivariate secant method, similar to the Gauss- 
Newton method, except that the derivatives in the Taylor series are estimated from the histogram of iterations rather than 
supplied analytically. A suitable convergence criterion is optionally used, e.g., where there is a change in loss function 
of less than I OA 
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Once modulating ligands are found and isolated or synthesized, crystallographic studies of the compounds 
complexed to a PPCA or pPPCA can be performed. As a non-limiting example. PPCA or pPPCA crystals are soaked 
for 2 days in 0.01-100 mM ligand and x-ray diffraction data are collected on an area detector and/or an image plate 
detector (e.g.. a Mar image plate detector) using a rotating anode x-ray source. Data are collected to as high a resolution 
as possible, e.g.. an inner limit of diffraction of 1 .5-3.5*. An atomic model of the inhibitor is built into the difference 
Founer map { F mmmii m ^ u -f^J. The model can be refined to adjust the atomic positions to improve the fit with the 
electron density maps, while maintaining correct stereochemical constraints. The model will preferably have low r.m.s. 
deviations from the ideal bond lengths, as well as for the angles, respectively, as well as a low R-factor (preferably less 
than about 25-35%, such as less than about 35, 34, 33, 32, 3 1 , 30. 29. 28, 27. 26, or 25%. 

Direct measurements of enzyme inhibition provide further confirmation that the modeled ligands are 
modulators of at least one biological activity of a PPCA or a pPPCA . As a non-limiting example, a modification (Chong 
ei aL, Biochim. Biophys. Acta 7077:65-71 (1991)) of the fluorometric assay of Potier (ei at, Anafyt. Biochem. 9*287- 
296 (1979)) is optionally used to measure neuraminidase inhibition or stimulation, optionally including determination 
of inhibition constants (* ; ). Other suitable PPCA activity assay include, e.g.. cathepsin A activity (Galjan ei at. J. Biol. 
Chem. 266:14754-14762 (1991); Endothelin I deamidase activity (Jackman, et at., J. Biol. Chan. 2673872-2875(1992); 
and tachykinin deamidase activity (Jackman. el at.. J. Biol. Chem. 265:1 1265-1 1272 (1990)). 

Ligands of a PPCA or pPPCA. based on the crystal structure of this enzyme, are thus also provided by the 
present invention. A PPCA or pPPCA ligand is any molecule, compound or composition that is capable of associating 
with a PPCA or pPPCA and optionally modulating at least one function or structural feature of a PPCA or pPPCA. 
Preferably, a PPCA or pPPCA ligand modulates at least one biological activity of a PPCA or pPPCA. Demonstration 
of clinically useful levels, e.g., in vivo activity is also impoitam. In evaluating PPCA or pPPCA inhibitors for biological 
act.vity in animal models (e.g., rat. mouse, rabbit) using various oral and parenteral routes of administration are 
evaluated. Using this approach, it is expected that modulation of a PPCA or pPPCA occurs in suitable animal models, 
using the ligands discovered by structure determination and x-ray crystallography. 
Evaluation of Therapeutic Potentials of Compositions via a PPCA Animal Model 

The present invention also provides methods for identifying diagnostic or therapeutic ligands of PPCA or 
pPPCA via computer RDD, to treat a PPCA-related pathology. Generally, a method for determining the therapeutic or 
diagnostic use of a PPCA or pPPCA modulating ligand, to treat a PPCA related pathology, comprises the steps of 
administering a known dose of at least one ligand containing compositions to an animal mode) having a phenotype 
corresponding to a PPCA-related pathology, monitoring the appropriate biological or biochemical parameters, and 
comparing the results with treated animals to those of untreated animals. Results indicating the onset or presence of a 
PPCA related pathology are generally referred to herein as "symptoms" of the disease. See., e.g., U.S. Appl. No. 
08/397.693, filed March 2, 1995, which is entirely incorporated herein by reference. 

Appropriate biological and biochemical parameters that reflect the onset and progression of a PPCA related 
pathology include, but are not limited to, (1) gross biological parameters, e.g., physical appearance (i.e.. flattening of 
the face, rough haircoat and/or subcutaneous swelling in affected animals) or growth (reduced weight gain); (2) gross 
behavioral parameters, e.g., lack of coordination; (3) biochemical assays, e.g.. assays of cathepsin A. N-acetyl-o- 
neuraminidase or p-gaiactosidase activities in primary cultures of skin fibroblasts or tissue homogenates; (4) 
histopathologic^ studies (visceromegaly, i.e.. enlarged liver and spleen: accumulation of secondary vacuoles in kidney 
tissues: etc.). 

A first method of evaluating the therapeutic potential of a composition using the transgenic non-human animals 
of the invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
phenotype corresponding ro a human PPCA related pathology; 

(2) Detecting the time of onset of symptoms in the first non-human animal: and 
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(3) Comparing the time of onset of symptoms in the first non-human animal to the time of onset 
of symptoms in a second non-human animal having a phenorype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition; 
wherein a statistically significant delay in the time of onset of symptoms in the first non-human animal relative to the 
5 time of onset of the symptoms in the second non-human animal indicates the potential of the composition for treating 
a PPCA related pathology. 

A second method of evaluating the therapeutic potential of a composition using the non-human animals of the 
invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
10 phenorype corresponding to a human PPCA related pathology at an initial time, t^ 

(2) Determining the extent of symptoms in the first non-human animal at a later time, t,; and 

(3) Comparing, at t„ the extent of symptoms in the first non-human animal to the extent of 
symptoms in a second non-human animal having a phenorype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition at t^ 

15 wherein a statistically significant decrease in the extent of symptoms at t, in the first non-human animal relative to the 
extent of the symptoms at t, in the second non-human animal indicates the potential of the composition for treating a 
PPCA related pathology. 

In the above methods, the composition being tested may comprise a chemical compound administered by 
circulatory injection or oral ingestion. The composition being evaluated may alternatively comprise a polypeptide 
administered by circulatory injection of an isolated or recombinant bacterium or virus mat is live or attenuated, wherein 
the polypeptide is present on the surface of the bacterium or virus prior to injection, or a polypeptide administered by 
circulatory injection of an isolated or recombinant bacterium or virus capable of reproduction within a non-human 
animal, and the polypeptide is produced within a non-human animal by genetic expression of a DNA sequence encoding 
the polypeptide. Alternatively, the composition being evaluated may comprise one or more nucleic acids, including a 
25 gene from the human genome or a processed RN A transcript thereof. Similarly, the composition being evaluated may 
comprise cells removed from a mammal and genetically engineered to overexpress a lysosomal protein or some other 
therapeutic polypeptide. 

Once the PPCA modulating ligand has been shown to be effective in an animal model, it can then be tested in 
human clinical trials, according to known method steps. 

30 In the above methods, delivery of the composition being tested to non-human animals is achieved via means 

appropriate for the composition being tested, e.g., by diet; by intermittent or continuous intravenous injection of one or 
more of the compositions or of a liposome (Rahman and Schein, in Liposomes as Drug Carriers. Gregoriadis. ed., John 
Wiley, New York ( 1 988), pages 38 1 -400; Gabizon, A., in Drug Carrier Systems, Vol. 9, Roerdink el aL eds., John 
Wiley, New York (1989), pages 185-212) or microparticle (Jke et a/., U.S. Patent 4,542,025 (Sep. 17, 1985)) 

35 formulation comprising one or more of the compositions; via subdermal implantation of drug-polymer conjugates 
(Duncan, R., Anti-Cancer Drugs 3:175-210 (1992); via microparticle bombardment (Sanford et a/., U.S. Patent 
4.945,050 (Jul. 31, 1990)); via infusion pumps (Blackshear and Rohde, in Drug Carrier Systems, Vol. 9, Roerdink ei 
aL eds.. John Wiley, New York (1989), pages 293-310) or by other appropriate means known in the art (see, generally, 
Remingtons Pharmaceutical Sciences. 18th Ed.. Gennaro, ed.. Mack Publishing Co.. Easton, PA (1990)). 

40 Pharmaceutical/Diagnostic Administration 

Using compounds or compositions comprising at least one PPCA or PPCA modulating ligand. the present 
invention further provides a method for modulating the activity of a PPCA or pPPCA protein in a cell. In general, 
ligands (antagonists or agonists) which have been identified to inhibit or enhance the activity of at least one PPCA or 
pPPCA ligand can be formulated so that the ligand can be contacted with a cell expressing at least one PPCA or pPPCA 
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protein in vivo. The contacting of such a cell with such a ligand results in the in vivo modulation of at least one 
biological activity of a PPCA orpPPCA. 

At least one PPCA or pPPCA modulating compound or composition of the invention can be administered by 
any means that achieve the intended purpose, using a suitable pharmaceutical composition or formulation. For example, 
administration can be by various parenteral routes such as subcutaneous, intravenous, intradermal, intramuscular, 
intraperitoneal, intranasal, intracranial, transdermal, or buccal routes. Alternatively, or concurrently, administration can 
be by the oral route. Parenteral administration can be by bolus injection or by gradual perfusion over time. 

A typical regimen for treatment or prophylaxis comprises administration of an effective amount over a period 
of one or several days, up to and including berween one week and about six months. It is understood that the dosage 
of a diagnostic/pharmaceutical compound or composition of the invention administered in vivo or in vitro will be 
dependent upon the age, sex, health, and weight of the recipient, kind of concurrent treatment, if any. frequency of 
treatment, and the nature of the diagnostic/ pharmaceutical effect desired. The ranges of effective doses provided herein 
are not intended to be limiting and represent preferred dose ranges. However, the most preferred dosage will be tailored 
to the individual subject, as is understood and determinable by one skilled in the relevant arts. See, e.g., Berkow a ai, 
15 eds.. The Merck Manual, 16th edition. Merck and Co.. Rahway, N.J.. 1992: Goodman et ai, eds., Goodman and 
Gilman's The Pharmacological Basis of Therapeutics. 8th edition. Pergamon Press, Inc., Elmsford, N.Y., (1990); Avery's 
Drug Treatment: Principles and Practice of Clinical Pharmacology and Therapeutics, 3rd edition, ADIS Press, LTD.. 
Williams and Wilkins. Baltimore. MD. (1987). EbadL Pharmacology. Little. Brown and Co.. Boston. (1 985). Osol el ai. 
eds.. Remingtons Pharmaceutical Sciences, 18th edition. Mack Publishing Co.. Easton. PA (1990): Katzung. Basic and 
20 Clinical Pharmacology. Appleton and Lange. Norwalk. CT (1992). which references are entirely incorporated herein 
by reference. 

The total dose required for each treatment can be administered by multiple doses or in a single dose. The 
diagnostic/pharmaceutical compound or composition can be administered alone or in conjunction with other diagnostics 
and/or pharmaceuticals directed to the pathology, or directed to other symptoms of the pathology. Effective amounts 
25 of a diagnostic/pharmaceutical compound or composition of the invention are from about 0. 1 jig to about 100 mg/kg 
body weight, administered at intervals of 4-72 hours, for a period of 2 hours to 1 year, and/or any range or value therein. 

The recipients of administration of compounds and/or compositions of the invention can be any mammals. 
Among mammals, the preferred recipients are mammals of the Orders Primata (including humans, apes and monkeys). 
Arteriodactyla (including horses, goats, cows, sheep, pigs), Rodenta (including mice. rats, rabbits, and hamsters), and 
30 Carnivore (including cats..and dogs). The most preferred recipients are humans. 

Having now generally described the invention, the same will be more readily understood through reference 
to the following example which is provided by way of illustration, and is not intended to be limiting of the present 
invention. 

Example J: Preparation, Purification and Crystallization of PPCA or pPPCA from Human 
35 Cells 

The present invention provides, in one aspect the determination of the three-dimensional structure of the human 
protective protein/cathepsin A (PPCA) in the precursor form (pPPCA) by a combination of molecular replacement and 
twofold density averaging. The structure presented here is the first of an enzyme associated with a human PPCA related 
pathology, and the third human lysosomal enzyme structure determined. The structure gives us insight into the zymogen 
40 activation mechanism of pPPCA . as well as the expected 3-D structure of PPCA and its specif.c and new enzvmatic 
activities. 

PPCA andpPPCA Expression and Purification 

Plasmld Constructs. AcMNPV transfer-plasmids pJR2 and pBC3 (Figure I ) were derivatives of plasmid 
P Ac373. carrying the entire polyhedrin gene (Smith a at.. 1985). In pJR2 a polylinker with a number of multiple 
45 cloning sites (MCS) was inserted directly 3' of the polyhedrin promoter, and substituted a 33-nucleotide deletion of the 
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polyhcdrin gene, starting with the ATG. pBC3 had the polylinkcr situated in a similar position as pJR2. but instead of 
the 33-nt deletion this piasmid featured an ATG codon mutated in ACG. Full-length human PPCA cDNA, PPCA54 
(Galjart e/ a/., 1988). and the two deletion cDNA mutants, 32(a20) and 20U32) (Galjart et aL, 1991). were subcloned 
either in pJR2 or pBC3 as EcoRl fragments, using standard procedures (Sambrook et al. % 1989). (Figure 1). The 
5 20(a32) deletion mutant was tagged with the human PPCA signal sequence, as reported earlier (Galjart et of., 1 991 ). 
All cDNA fragments were engineered to have short 3 ? and 5' untranslated regions (< 10 bp). 

Transftction and Selection of Recombinant Baculovirus. Spodoptera fhigiperda insect cells (IPLB-SF2 1 ) 
were cultured in monolayers at 27*C in TNM-FH medium (Hink, 1970), supplemented with 10% FBS and antibiotics 
(complete medium). Wild-type (wt) AcMNPV virus strain E2 (Smith and Summers, 1978) and recombinant 

1 0 baculoviruses were propagated on confluent monolayers of Sf2l cells. Recombinant constructs AcPPCA54, AcPPCA32 
and AcPPCA20 were generated by cotransfecting Sf21 cells with 1 ug wt-AcMNPV DNA and 10 ug piasmid DNA, 
using the calcium phosphate method, modified for insect cells (Graham et aL, 1973; Carstens et aL 1980; Summers et 
als 1987). Recombinant polyhedrin-negative recombinant baculoviruses were then selected and purified by sequential 
plaque assays, and verified by dot blot and southern blot analysis (Summers et aL 1987). Large quantities of inoculum 

15 were produced by infection of insect cells at 25-50 % confluency, with recombinant virus at a multiplicity of infection 
(MOl) of < I pfu/cell. After 3 to 6 days at 27°C, when all cells appeared infected, the medium was harvested and 
centrifuged for 5 m at 1000 rpm to remove detached cells. The litre of the inoculum was determined by plaque assay 
analysis. 

Protein purification and western blotting. Sf2l cells were cultured in either 175 CM 2 or 500 CM 2 flasks (triple 

20 flask, Nunc) to near confluency, and infected with recombinant baculoviruses at a MOl of 5-10 pfu/cell. After 1.5 h 
incubation at 27 °C, the inoculum was replaced with complete medium for additional 8 to 10 hrs. Cell monolayers were 
then rinsed with PBS and cultured further for 38 h in unsupplemented Grace's medium. After infection the medium was 
collected, centrifuged for 5 m at 1500 g, and for 1 h at 100.000 g (Beckmann SW-28 rotor) to remove virus particles. 
After centriftigation the supernatant was concentrated 20-fold, in an Amicon stirred cell. Glycoproteins were purified 

25 -60% using a concanavalin A-SEPHAROSE affinity chromatography column, as described earlier (Verheijen et aL, 
1982). Total protein concentration was measured using the method of Smith et aL, (1985). Aliquots of the purified 
preparation were resolved on 12.5% SDS-poly aery lam ide gels under reducing and non-reducing conditions. Gels were 
either Coomassie brilliant blue- or silver stained (Sambrook et aL 1 989). For western blotting, proteins were transferred 
from gels to IMMOB1LON PVDV membranes (Millipore Corp.). using a semidry blotter (The W.E.P. company). 

30 Development and Use ofpPPCA antibodies. A 1 5 amino acid peptide (N H 2 -Cys-Met-Trp-His-G In- Ala-Leu- 

Leu- Arg-Ser-G I u-Asp-Lys-A la-Arg-COOH) (Figure 5), based on the C-terminal sequence of the 34-kDa PPCA subunit 
(amino acid 285-298, Galjart et ai % 1988), was synthesized on a peptide synthesizer (Applied Biosystems), and 
covalently linked to the carrier protein Keyhole Limpet Hemocyanin % using the 1MJECT ACTIVATED IMMUNOGEN 
CONJUGATION KIT (Pierce). Polyclonal antibodies against the conjugated product were raised in rabbit, by multiple 

35 subdermal injections of the protein (40-125 ug) mixed with incomplete Freunds adjuvant (Pierce). Rabbits were bled 
34 days after the first injection. The antibodies, designated anti-pep, were tested on immunoblots and by 
immunoprecipitations of baculovirus produced PPCA. 

Blots were incubated for at least 12 h in blocking buffer (0.01 M tris-buffered saline pH 8.0 (TBS). 0.05% 
Tween 20. and 3% (w/v BSA). and subsequently probed for 2 h with polyclonal PPCA antibodies, anti-54, diluted 1 :200 

40 in fresh blocking buffer. They were then washed for 1 h in TBS. 0.05% Tween 20. and incubated for 2 h with alkaline 
phosphatase conjugate anti-rabbit IgG (Sigma, 1:1000 in blocking buffer). Proteins were visualized using alkaline 
phosphatase substrate (Sigma. 4-aminodiphenylamine diazonium sulfate, naphtol as-mx phosphate). 

Crystallization of PPCA. Fractions containing the precursor form of the protein as assayed on an SDS-PAGE 
gel were pooled. Subsequently the protein was concentrated to 5 mg/ml and the buffer exchanged to 50 mM NaAc pH 

45 5.2 or 50 mM MES pH 6.5 using a CENTRICON-I 0. Crystals were grown using the hanging drop vapor diffusion 
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technique. Crystals suitable for data collection were grown using a reservoir solution containing : 2- 10 % PEG 8000 
pH 8.0 - 8.3. 50mM TRIZMA. ImM NaN„ 0.25 % Mtyl glucoside at 4-12'C. Mixing non-equal volumes of protein 
solution (in the range 5-IOul) and reservoir solution ( in the range 2-6 W) enhanced the occurrence of single large 
crystals per drop under these crystallization conditions. The concentration of the protein solution before mixing was 
5 mg/ml. Crystal growth was enhanced by macrocrystallization techniques (anything that promotes growth of big 
crystals) and in some cases by micro- and macroseeding techniques. 

Example 2: Structure Determination of a pPPCA Crystallized from Human Cells 
Data Collection, Data Processing and Reduction. 

To allow for data collection at cryotemperatures, the crystals were cryoprotected by adding glycerol in 5% -10% 
steps to a solution of about 12% PEG 8000, 50 mM TRIZMA, pH 8.0. ImM NaN,, 0.25% p-octyl glucoside which 
served as an artificial mother liquor. The crystals were incubated for half an hour at 40'C after each addition of 
glycerol. The final mother liquor contained 30% glycerol. Gradually increasing the glycerol was needed to help keep 
the crystals from cracking. 

Diffraction data was collected at the Stanford Synchrotron Radiation Laboratories (SSRL) to 2 0 A at -178 e C 
on a MAR imaging plate at a wavelength of 1 .08 A on beanvline 7-1. The diffraction coordinate data (corresponding 
to atomic coordmates monomer I, the other monomer coordinates are provided by matrix conversion of these 
coordmates. as presented herein) was processed and reduced using MOSFLM version 5.2 from the CCP4 program 
package (SERC (UK) Collaborative Computing Project 4. Daresbury Laboratory UK. 1979). The program REFIX 
(Kabsch (1993), infra) was used for auto- indexing. Using the CCP4 program suite (SERC (UK) Collaborative 
Computing Project 4. Daresbury Laboratory UK. 1979). the intensities were scaled (ROTAVATA). merged 
(AGROVATA) then converted to amplitudes and truncated with the program TRUNCATE. Statistics of the data 
collected are given in Table 1 The V. (Matthews, B.W., J. Mat. Biol. 33:491-497 (1968)) is 3.2 A>/Da for 2 monomers 
in the asymmetric unit, corresponding to a solvent content of 62%. 
Molecular Replacement 

Search Model: The best molecular replacement results were obtained using a multi-Ala core as a search probe 
The 'multi-Ala core' search model was constructed from the atomic coordinates of the CPW monomer (Liao et al. 1992) 
based on the sequence alignment as presented in Figure 15. Regions expected to deviate in structure between PPCA and 
CPW were deleted from the model (i.e. with low sequence identity or located in loops). The 125 residues identical in 
PPCA and CPW were left in the model; 1 12 residues were truncated to alanine. The remaining 94 residues through 
differing between CPW and PPCA, were considered sufficiently similar in size and the CPW residue left as such in the 
model. The resulting 'multi-Ala core' monomer consisted of 33 1 residues, constituting a large portion of the core domain 
and Imle atom.c information for the 'cap' domain (see Figure 1). The model contained 30% of the expected protein 
scanenng mass given the fact that there are two monomers in the asymmetric unit. The sequence identity between this 
search model and the true PPCA structure was 37.7%. 

Rotation Function. PC Refinement and Translation Function: Native data of 8 - 4A was used in the 
molecular replacement calculations. The rotational searches utilized a real space Patterson search method, as 
implemented in X-PLOR(Steigeman. 1974;Huber, 1 985. BrQnger 1 992a) with a Patterson vector cutoff of 2 1 A. The 
self-rotation function failed to reveal any non-crystallographic two-fold symmetry relating two monomers in the 
asymmetric unit. In addition, the native self Pattersons did not reveal the presence of a non-crystallographic .wo-fold 
ax.s parallel to a crystallographic axis. These results indicated that the two monomers in the asymmetric unit might not 
form a dimer together. The cross-rotation function was carried to find the orientation of the two monomers* the 
asymmetric unit as follows. Patterson vector sets were calculated for the search model and the native data and the 8000 
strongest Patterson vectors were used in the rotation function. The rotational space restricted to the asymmetric unit of 
the rotauon function according to Rao et al.. 1980. was sampled by rotating the Patterson vectors from the search model 
around Eulerian angles 61 . 62. and 63, while sampling 82 in angular grid intervals of 2.5*. The 5000 highest rotation 
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function grid points were selected resulting from the product function of the two Patterson vector sets. The grid points 
(differing less than 8° around any given axis) were then clustered. The result was a list of 169 possible solutions for 
the rotation function, each corresponding to a set of three angles describing an orientation. The two top solutions were 
3.9 and 3.8 sigma above the mean. PC-refinement (Brunger, 1990) was carried out to optimize each of the 169 possible 
solutions using the complete search model as a single rigid body. This yielded two orientations with a PC-index of 0.043 
and 0.05 1 respectively. The orientations of these solutions were (D x - 261 .4, Z> 2 = 36.22, D y ■ 147.28); and (D , = 
18.52, 0 : = 47.40, D y = 23.22), respectively. In contrast, the rest of the possible solutions yielded an average PC-index 
of 0.022. 

Individual translation function calculations were performed on a I A grid. A translation^ solution was found 
for each orientation at positions (x=33.30, y«51 .97, and z=12.79) and (x=25.23, y=28.58, and z=22.02), with respect 
to the crystallographic center, as 7.7 and 8.8o, respectively, above the mean. The R^ for the individual solutions was 
55.6% and 54.8% in the resolution range 8.0 to 4.0A, with a correlation coefficient (CC) of 0.095 and 0.1 14. A 
combined translation function was calculated to place each solution relative to the same crystallographic origin, resulting 
in an of 52.8% for data between 8.0 and 4.0A, bringing the down to 5 1 .3% and increasing the CC to 0.22. 
The molecular packing was assessed on a graphics workstation, which revealed no clashes between the placed search 
probes. However, a very large amount of empty space was present. The packing showed that the asymmetric unit 
contained two half dimers, each forming a dimer with another monomer in a neighboring unit cell. The two cores in 
the asymmetric unit were related by K=73° around an axis tilted 15.5° off the crystallographic a axis lying in the ax 
plane. 

Iterative Model Building and Two-fold Averaging 

Initial Electron Density Map: A2m|F ote | -DIF^I SigmaA weighted map (Read, 1986) was calculated using 
IF^j's and phases from the molecular replacement solution. The map was contoured at lo and showed good density 
for most of the core. Density emerged for many side chains where the input model residue had been an Ala, indicating 
that the molecular replacement solution was correct. 

First Model Built: The two rotated and translated search probes formed the starting point for model building 
of the PPCA precursor. The non-crystal lographic symmetry (NCS) matrix was determined between the two cores using 
the M Lsq_explicit" option in the computer program O (Jones et a/., 1991). Subsequently a 'best monomer* was built by 
superimposing the electron densities from each monomer core, and adjusting the model accordingly. Residues were only 
incorporated in the model where the electron density was visible for the complete side chain. Residues from the search 
model for which no density was visible were removed. An alanine was built in the model at places where electron 
density for a side chain was partial. In this manner 294 residues, i.e. 65% of the C* atoms were built in the *best 
monomer* core. The second monomer was generated from the 'best monomer* model using the NCS operator relating 
the two monomers in the asymmetric unit. At this point the data set was partitioned in a working set and a test set 
consisting of 5% of the reflections between 8 - 2.2A to monitor the R fffr (Brunger et al. 1992b). The working data set 
was used for rigid body and positional refinement. For averaging and map calculations the unpartitioned data set was 
used. Twenty-five cycles of refinement using the two *best monomers cores' positioned in the asymmetric unit as rigid 
bodies and data from 8.0 - 3.0A. resulted in an R^ of 53.5% for this resolution range. The atomic coordinates of this 
partial model were used to calculate a new 2m jF^I - DIF^I SigmaA weighted map which we called the 'best monomer 
map*. 

Averaging: Search for Missing Density: The phasing power from the rigid body refined 'best monomer 
cores*, consisting of 294 residues per core was insufficient to bring back imerpretable electron density for the missing 
pan of the model, 158 residues per monomer. To overcome this a 'bootstrapping' procedure was applied, entailing 
density averaging using RAVE (Kleywegt & Jones. 1994a) and model expansion. The 'best monomer map* and the rigid 
body refined "best monomer cores' served as the starting point for this procedure. 
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Six bootstrapping cycles were carried out. called bmcl through bmc6. allowing for the model to be extended 
in stepwise increments. Figure 16 shows a scheme of the steps incorporated in one bootstrapping cycle. After a cycle 
in which the model had undergone major expansion, a new molecular mask was calculated with MAMA (Kleywegt & 
Jones. 1994b) for use in the subsequent bootstrapping cycle. No phase recombination was applied between 
bootstrapping cycles. At the end of each cycle the inverted phases a*, and inverted amplitudes F w *s were discarded. 
The NCS operator was re-optimized after cycle bmc3. The resolution range of the data included 7n the bootstrapping 
cycle started with 15- 3.0Aforbmcl and was gradually extended to 1 5- 2.7 A in bmc6. The bootstrapping procedure 
is summarized in Table 2. To optimize the bootstrapping procedure, consideration was given to the molecular mask used 
in the averaging, the model building strategy and the refinement procedure. 

Molecular masks: Four different masks were constructed in total. The atomic radius of all atoms was set to 
4A to calculate each mask. The masks were then manually modified using mask editing options in O (Jones et at. 1 99 1 ) 
Maskl . was constructed around the 'best monomer core'. Subsequently it was greatly enlarged by multiple blocks of 
10 - 15 A J in the regions where the model was incomplete (Figure 17). This was crucial to prevent the density in the 
insertion area's from being flattened during the averaging step. Approximately one half of the dimer interface was 
15 estimated to be formed by regions from the missing cap domain. Major expansions of the mask in this area were made 
to accommodate for this. This resulted in a serious overlap problem when the mask was duplicated to cover a complete 
dimer. The mask was reduced where overlap occurred with the -overlap_trim" option of MAMA. After several 
bootstrapping cycles, new incorporated polypeptide fragments were carefully assigned to one of the two monomers 
forming the dimer and the mask at the dimer interface area's was manually adjusted accordingly. Essentially the masks 
were kept far too large in regions where the model was missing in order to avoid erroneous flattening of electron density 
In contrast the masks were tightened around the area's of the molecule where the model was complete. 

Model Building: A conservative model building strategy was adopted. Initially only side chains were mutated 
in the core region to fit the PPCA amino acid sequence and where the density was clear, poly-alanine fragments were 
built in the insertion area's (loops and the cap domain). Newly included atoms were given a B-factor of 20 A* Only 
once models bmc5 and bmc6 were obtained, was the electron density of sufficient quality to allow side chains to be 
incorporated confidently in the cap domain (residues. 190 - 303). A. this stage the C trace was virtually complete for 
the whole dimer and the sequence could be fit unambiguously. 

Refinement: Positional refinement was postponed until after 3 cycles of bootstrapping resulting in a final 
model containing 91% of the C atoms. Forty steps of positional refinement were then carried out to improve the 
geometry of the model. Subsequently only one of the refined monomer was taken and the other generated using NCS 
operators. The rational for delaying the positional refinement is addressed in the discussion. 

Completing the model: deviations from two-fold symmetry. It was possible to add 148 residues and 1 85 side 
chains per monomer after a total of 6 bootstrapping cycles. At this stage, each subunh contained 442 residues and 4 13 
side chains, i.e. 98% of the C and 91% of the side chains atoms. The gradual model expansion as a function of the 
35 bootstrapping cycle is shown in Figure 1 8. 

Twenty residues were still missing in the asymmetric unit at this stage. These were localized to two stretches 
per monomer (260 - 262 and 287-292). With most of the scattering mass incorporated, the monomers from model bmc6 
was refined individually with X-PLOR (Brunger. 1992a) in an attempt to retrieve electron density for the still missin» 
residues. After 40 steps of positional refinement using data from 8.0 - 2.6 A. the R,_ dropped significantly from 40.2% 
to 33.2%. The model was further positionally refined using a full weight W< on the crystallographic term The data 
included in the refinement was gradually extended to 22 A. At 2.4 A resolution individual B-factors were refined and 
the distribution checked as a function of atom location (ie., low B-factors in the core and high B-factors on the surface) 
Cycles of refinement and refitting allowed for 18 missing residues to be added. Essentially almost the complete cap 
domain was retrieved using the bootstrapping procedure, as shown in Figure 19. It became apparent from the refined 
maps that the two stretches of missing amino acids adopted a very different conformation in the two monomers (with 
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as much as an average r.m.s.d. of 7.9 A for the C"s of residues 287 - 292). For this reason electron density for these 
regions had noi been retrieved in the two-fold averaging process. The stepwise improvement of the electron density 
maps along with averaging, model expansion and refinement is shown in Figure 6. 

The program ARP was used to check our model in particular the region at the dimer interface (Lamzin & 
5 Wilson, 1993). Prior to the final round of positional refinement an IF^l/o cutoff was applied to reject 10% of the 
weakest data as well as an anisotropic scale factor to ofTset the decreased resolution along the crystallographic a axis. 
The final model is of good geometry with a final 1^ of 21.3% (R, ret of 26.8 %) for data between 8.0 and 2.2 A (see 
Table 3). A Ramachandran plot is given in Figure 21. The r.m.s. coordinate error is 0.282 as calculated by SigmaA 
(Read, 1986). The average phase difference between the initial molecular replacement model and the currently refined 

10 model is calculated to be 71° for data between 10-2.2A. 

The structure determination of PPCA is special in that two-fold averaging could be applied to refine very poor 
molecular replacement phases, enabling us to retrieve electron density for 148 residues and 185 side chains per 
monomer. In total 3 14 complete residues were added per asymmetric unit, equivalent to about 35 kDa of protein. In 
retrospect we feel that a number of factors contributed to a successful structure determination. 

15 Crystal Packing. Each monomer in the crystal is interacting with four non-crystal lographically related 

monomers. By far the most extensive contact is with a non-crystallographically related monomer generating the 
physiological dimer. Three additional contacts are extensive crystal contacts ranging from 200-800 A 2 averaged per 
monomer. Tne largest nondimer crystal contact involves the precursor loops from two crystallographically independent 
monomers (region 265-267. 281-295 from monomer I with residues 28 1-293 from monomer 2) making intimate contact 

20 with each other. Summed together these loops create an intermolecular buried surface of 1680 A* We believe that this 
stabilizes an otherwise very flexible area, possibly explaining the good diffraction qualities of the P2,2,2 crystals. 

It is also in this crystal contact that we find deviating spaciai conformation and secondary structure between 
the two monomers as mentioned before. The electron density in this region is of very good quality with average 
temperature factors of 16.6 A 2 for main chain and 1 8.3 A 2 for side chains. 

25 pPKA and the Hydrolase Family. The fold of pPPCA belongs to the large hydrolase fold family containing 

enzymes such as the serine carboxypeptidases, dehalogenase, various lipases and acetylcholine esterase (Ollis et al. 
( 1 992), mfra\ having various different catalytic functions. Though the central core is the same (a central P-sheet flanked 
by o-heltces on both sides) the proteins in this family all seem to have different 'cap' domains, both with respect to fold 
as well as size (Figure 7A-F). pPPCA has one of the largest cap domains comprising 121 residues forming the three 

30 helical bundle of the helical subdomain and a three stranded p-sheet of the maturation subdomain. 

Major Differences and Comparison With the Serine Carboxypeptidases. The overall fold of the pPPCA 
monomer is similar to that of the wheat and yeast serine carboxypeptidases (Endrizzi et al. (1994), infra; Ollis et al 
. (1992), infra). The complete core domains of pPPCA and CPW superimpose with an r.m.s. deviation of 1.7 A for 302 
Co atoms and 38% sequence identity. Deleting major deviating loops from the core domain allows for pPPCA to 

35 superimpose with an r.m.s. deviation of 1.2 A onto CPW and CPY (293 equivalent Cs with 40 % sequence identity for 
CP W/pPPCA and 27 1 equivalent C*s for CPY/pPPCA with 42.2% identity). 

The cap domain in pPPCA differs significantly from the CPW and CPY counterparts. The pPPCA structure 
reveals a large maturation subdomain not present in the structure of CPW and CPY for which the structures of the 
enzymatically active forms are known. All three enzymes contain a 3 helical bundle in the cap domain. The sequence 

40 identity between the three proteins in this region is very low (ca. 12 %). In contrast PPCA shows a much greater 
deviation. Hal superimposes reasonably well with the CPW counterpart maintaining the same general orientation with 
respect to the core domain (requiring a rotation of only 7.4°). But helices Ho2 and Ha3 have undergone major rotations 
with respect to Hoi and the core domains by k = 28.5° and k » 93.4°, respectively (Figure 8A). 

Due to the integral role of the cap domain in forming the dimer interface, the dimers of PPCA and CPW were 

45 compared. In the pPPCA and CPW dimers the monomers are oriented differently with respect to each other. 
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Superposition of the core domain of one monomer from each dimer shows that the second pair of monomers (forming 
the respective dimers) differ by a remarkable 15° in orientation (Figure 8B). Thus, it appears that the extensive 
differences in the cap domains lead to a different arrangement of the subunits in the dimers of PPCA and CPW. 

Catalytic Ttiatl and Enzymatic Mechanism. Our stnicture shows that the precursor PPCA has all the elements 
proposed for the enzymatic machinery of the serine carboxypeptidase family (Liao et al. (1992), infra: Endrizzi et at. 
(1994), infra), and is now discovered to be the third structure elucidated belonging to this family of enzymes after CPW 
and CPY. The catalytic triad in the active site of pPPCA is formed by residues Ser 1 50. His 429 and Asp 372. The C" 
of Ser 150 forms a good hydrogen bond with the N'l of His 429 with a N to O distance of 2.8 A. The N*l of His 429 
is 2.7 A removed from the 0*2 and 3.3 A from the 0*1 of Asp 372. Further, two backbone amides appear to orient the 
carboxylate group of Asp 372. The N of Ala 374 is at a distance of 3.0 A to the O" of Asp 372 and the N of Cys 375 
is at a distance of 2.9 A to the 0* 2 of Asp 372. 

The oxyanion hole proposed to stabilize the negatively charged tetrahedral intermediate in serine 
carboxypeptidases is formed by the backbone amides of Gly 57 and Tyr 151 in PPCA. The 32 atoms of the catalytic 
triad residues plus the oxyanion hole amides from PPCA, CPY and CPW superimpose with an rm.s. deviation of 0.4 
A indicating the very high degree of structural similarity of the active site in the PPCA precursor with those in the fully 
active enzymes CPY and CPW, (see Table 4). The carboxylate of Asp 372 and the imidazole of His 429 in PPCA are 
non-planar, making an angle of approximately 60' between the imidazole and the carboxylate. A similar non-planarity 
has been observed in CPW and CPY. in contrast to the planar orientation found in subtilisin-.and trypsin-rype serine 
proteases (McPhalen etal.. Biochemistry 27:6Si2-6S9i (1988)). 

In pPPCA. a pair of glutamic acid residues (Glu 69 and Glu 149) is positioned near the catalytic triad, with their 
carboxylate groups interacting with each other. The carboxylate groups are located at approximately 8 A from the 0' 
of Ser 150, and lie at the bottom of the active site. An asparagine (Asn 55) is orientated such that it forms a hydrogen 
bond to each of the two carboxylate groups of the glutamic acid pair, at an N" (Asn) to 0"/0° (Glu) distance of 3.0 and 
3.6 A, respectively. In addition the two carboxylates interact with each other via hydrogen bonds. This configuration 
of two glutamic acid residues and an asparagine, is conserved between pPPCA, CPW and CPY (see Table 4). and has 
been implicated in regulating the low pH optimum for the carboxypeptidase activity found in the serine 
carboxypeptidases (Liao et al. (1992). infra). Biochemical data has suggested that a functional group with an apparent 
pK, value of pH 5.5. functions to bind the C-terminal carboxylate group of peptide substrates and is responsible for the 
observed pH optimum of 5.5 (reviewed in Breddam et al. (1986). infra: Rawlings & Barrett (1994), infra). Together 
with their structural data, Liao and colleagues (Liao et at. (1992). infra) have suggested that at pH 5.5 or below, one or 
both glutamates must be uncharged, while at a pH higher than 5.5 one or both of the carboxylates which are orientated 
opposite to each other, may become deprotonated resulting in unfavorable electrostatic interactions. This would disturb 
the hydrogen bonding pattern or result in structural perturbations causing the observed increase in K„ for peptide 
substrates at high pH. In pPPCA the orientation of this pair of glutamic acids as well as that of the asparagine is 
essentially identical in structure to the equivalent residues in CPW and CPY (see Table 4). even though the structure has 
been determined at pH 8. The CPW and CPY structures have been determined at pH 5.7 and at pH 6.5-7.0. Thus, our 
structure appears to rule out large pH induced conformational changes of these three residues at least up to a pH value 
2.5 units above that optimal for carboxypeptidase activity. However the high degree of conservation of these residues 
does indicate some role in a characteristic shared by all three enzymes. 

From our comparison it is clear that the enzymatic machinery in the PPCA precursor form is in a conformation 
virtually identical to that found in the fully active CPW and CPY enzymes. On this basis, the conformation of the 
enzymatic machinery found in pPPCA is expected to faithfully represent the conformation that will be found in the 
active PPCA. 



10 
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Active Site, Substrate Specificity. PPCA has a substrate preference for hydrophobic residues in the PI and/or 
PI' binding pockets (Jackman et ai. Hypertension 2/:925-928 ( 1 993)). In CPW the PI' pocket was identified to consist 
of two tyrosine residues (Tyr 60 and Tyr 239) which form a long channel, capped by two acidic residues (Glu 272 and 
Glu 398) at the end (Liao etai (1992). infra). This explains the highest preference of this enzyme for Arg and Lys as 
the leaving group (Breddam et ai, Carisberg Res. Commun. 52:297-3 1 1 (1987)). In CPY a similarly shaped pocket is 
formed by the residues Thr 60, Tyr 256. Leu 272 and Met 398 (Endrizzi et ai (1994), infra). In PPCA the analogous 
residues are Tyr 247 and Asp 64, forming the sides of the pocket with at the far end Met 430 and Thr 304. This is 
reasonably consistent with an overall preference of PPCA for a hydrophobic leaving group. 

Inacthmion Mechanism of the Precursor Form. During the maturation step of the PPCA precursor form, at 
maximum residues 285-298 forming the 'excision' peptide, are removed by an as yet unidentified protease(s). In vitro, 
the maturation event can be mimicked by digestion with trypsin utilizing probably positions Arg 284. as well as Arg 292 
and/or Arg 298. The residues forming the 'excision' peptide adopt distinctly different conformations in the two 
crystallographically distinct monomers forming the PPCA dimer in our crystal structure. Yet in both monomers this 
polypeptide region extends out from the protein surface and is virtually completely solvent and protease accessible 
15 (Figure 9). Arg 284 and Arg 292 are particularly well exposed. The main chain atoms of Arg 298 are less accessible, 
being sandwiched between the strand M02 and a loop N-terminal to helix Ca6, while a salt bridge with Glu 264 renders 
the side chain atoms of Arg 298 partially solvent inaccessible. 

The active site cleft is blocked by numerous residues from the maturation subdomain in the precursor form of 
PPCA. The catalytic triad is rendered solvent inaccessible by residues Asn 275, He 276 and Phe 277. These residues 
20 are part of the polypeptide Asp 272-Phe 277 which we call the 'blocking' peptide. This peptide is held down 
predominantly by hydrophobic contacts of Leu 273, He 276, and Phe 277 to the core domain residues Gly 57, Cys 60, 
Leu 180, Leu 190, Val 191. Leu 232, Val 235, lie 246, Leu 280, Leu 282. Met 299 and Ala 373 (Fig 10). In addition 
residue Asn 275 of the blocking peptide appears to fill what might be part of the Pi binding pocket in the mature form. 
Further inspection of the blocking peptide suggests that Gly 274 with Ramachandran angles 4> « 66° and <f> = 28°, might 
25 play a central role in the strand blocking the active site. A glycine at this position appears critical to allow the 
polypeptide chain to adopt a conformation with its main chain at a safe distance from the catalytic triad. This might aid 
in allowing the blocking peptide to assume a conformation resistant to autocatalysis. The PI ' binding pocket seems to 
be beautifully filled by Pro 301 interacting with Thr 304, Tyr 247, Cys 60 and Cys 334. Thus substrate binding is not 
possible in the precursor form due to the inaccessibility of the substrate binding pockets. 
30 We conclude that the inactivation mechanism of PPCA is based on blocking of the active site, and not upon 

changes in the position of functional groups involved in catalysis/transition state stabilization. Both the PI, P2 and PI' 
binding pockets are rendered solvent inaccessible. The function of the blocking peptide seems to be to render the 
catalytic triad as well as the region around the PI and P2 binding pockets solvent inaccessible. The blocking peptide, 
however, does not assume a conformation that a peptide substrate would adopt. It is carefully positioned in a manner 
35 which is different from that of a productive substrate, thereby avoiding being by the nearby catalytic residues which 
are correctly poised for catalysis. A crucial observation is that the excision peptide itself does not bind in the active site 
cleft. Hence, mere removal of the excision peptide alone is not sufficient to allow solvent or substrate access to the 
active site. 

Proposed Maturation Event and Extent of Conformational Rearrangement. The active site of the precursor 
40 of PPCA appears to be fully blocked by 49 residues of the maturation subdomain. as shown in Figure 1 1 . Based on the 
precursor structure and the comparison with CPW and CPY it is proposed that a region comprising approximately 
residues 254-284 rearranges to free the PI, P2 binding sites, while the residues 299-302 rearrange to free the PI' binding 
pocket. The linker connecting these two segments of polypeptide chain is the 14 amino acid excision peptide Met 285- 
Arg 298. The extent of the residues rearranging is likely to be limited by a disulfide bridge Cys 253 and Cys 303, which 
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is conserved in the serine carboxypeptidase family. This critical disulfide serves to keep the secondary structure 
elements together at the far end of the PI' pocket. 

An interesting pair of salt bridges is observed between Arg 262, Asp 300, Glu 264 and Arg 298, four residues 
located on strands M(M and MP3 of the mixed p-sheet found in the maturation subdomain. This cluster of residues is 
5 strategically positioned at the base of the excision peptide, close the core domain and 'shielding' the mixed P-sheet via 
side chain interactions (see Figure II). These residues are strictly conserved among the human, mouse and chicken 
PPCAs (Galjan et al (1991), infra). This charge cluster may be effected by a shift from neutral to acidic pH. Arrival 
in the endosome/lysosome is expected to result in protonation of either the Asp or the Glu residue or both, resulting in 
unfavorable electrostatic interactions and destabilization of this charge cluster. This in turn is expected to promote partial 

10 unfolding of maturation subdomain, allowing easier access to additional potential cleavage sites, and stimulating removal 
of the 'blocking* peptide which fills the active site in the precursor. 

A similar double salt bridge has been observed in the aspanic proteinase zymogen pepsinogen between the 
proenzyme segment (Arg 8P) and the enzyme (Arg 308, Glu 13, Asp 304). 

The maturation mechanism for pPPCA appears to be novel among proteases for which the three-dimensional 

15 structure of the zymogen is known. The catalytic triad in the precursor form is in a catalyticaily competent 
conformation. Enzymatic activity is prevented by a 'blocking* peptide. The blocking peptide is however different from 
the excision peptide and docs not get excised from the mature enzyme. This leads to the distinct difference with the 
other known maturation mechanisms in that, after disappearance of the excision peptide, up to 35 residues filling the 
active site cleft in the PPCA precursor must rearrange to render the catalytic triad solvent accessible (see Figure 12), but 

20 do not get cleaved off. Removal of the excision peptide, and possibly a shift to lower pH in the endosome/lysosome, 
appears to be a trigger for this event. The mechanism does not appear to be autocatalytic, as uptake experiments with 
cultured galactosialidosis fibroblasts, have shown that a mutant PPCA with the catalytic Ser 150 mutated to Ala, is 
properly targeted and processed. It retains its protective function and except for the loss of catalytic activity is 
biochemically indistinguishable from the wild type enzyme (Galjart etal (1991), infra). Surprisingly, the maturation 

25 mechanism of the serine carboxypeptidases PPCA, CPW and CPY may all differ from each other as well. This is 
clearest for CPY, in which a 91 residue polypeptide is cleaved otTN -terminally to convert the zymogen to an active 
enzyme (Winther and Sorensen, /Voc. Natl Acad Sci USA M:9330-93 34 (1991)), as opposed to the excision of a 
peptide from within the zymogen generating a two chain active form as is the case for PPCA and CPW. 

Looking at the hydrolase fold family, the catalytic triad is housed in the core domain and the various cap 

30 domains attenuate the biological function by influencing entirely different properties such as: (I) enzyme kinetics 
exemplified by the interfacial activation of lipases (Smith etal.. Curr. Opinion in Structural Biology 2:490-496 (1992)); 
(ii) substrate channeling as is proposed for acetylcholine esterase (Sussman et al. (1991). infra); (iii) substrate 
recognition, proposed for dehalogenase by (Franken et al (1991), infra) and for CPY and CPW by (Endrizzi et al. 
(1994), infra); and (iv) enzyme inactivation in the case of PPCA. 

35 Biological Implications. Deficiency of the protective protein/cathepsin A (PPCA) in humans results in the 

lysosomal storage disease galactosialidosis. PPCA is thought to form a multi-enzyme complex with P-galactosidase and 
neuraminidase in the lysosomes protecting the latter glycosidases in their harsh acidic and proteases-rich environment. 
PPCA has a 30% sequence identity to the wheat serine carboxypeptidase (CPW) and yeast serine carboxypeptidase 
(CPY). It has been show that PPCA in the precursor form is inactive, but upon maturation, entailing excision of a 2 kDa 

40 peptide, carboxypeptidase activity is released. 

The precursor structure reveals an inactivation mechanism that has not been seen before in any of the other 
known zymogen structures of proteases (available for the serine-, metallo- and aspartic protease classes). The catalytic 
triad seems to have an arrangement poised for catalysis. However, the triad is rendered solvent and substrate 
inaccessible by a strand from the maturation subdomain binding in the active site cleft. Surprisingly, this strand called 

45 the 'blocking' peptide does not overlap with the 2 kDa 'excision' peptide. Hence, after removal of the excision peptide 
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up to 35 additional residues must rearrange in order to unblock the active site cleft A strategically positioned pair of 
salt bridges, comprising Arg 262, Arg 298. Giu 264. and Asp 300 at the base of the excision peptide, are expected to 
optionally become destabilized at low pH, unraveling this region of the structure, allowing easier access to cleavage sites 
and/or promoting the rearrangement event. 
5 A number of research groups are currently involved in designing enzyme and gene therapy procedures for 

several lysosomal storage diseases. Insight into the three-dimensional structure, protein functioning and stability of 
PPCA, the first enzyme of known structure associated with a lysosomal storage disease and the third human lysosomal 
structure to be determined, may prove useful in future designs of an adequate therapy procedure for galactosialidosis. 
Information from the three-dimensional structure of PPCA, might also aid in designing an engineered form of PPCA 
10 with increased stability and a longer half-life. 
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Table J: X-ray Data Collection Statistics 



5 



rponliitinn 


32.27-2.2 A 


wavelength 


1.08 A 


space group 


P2.2.2 


unit cell 


a=l 15.04 b =148.1 1 c=80.97 A 


temperature of data collection 


-178°C 


No. of observed reflections 


436,709 


No. of unique reflections 


67,740 


completeness of all data 


95.7% 


R,y™ for all data 


5.1% 


completeness of outer shell 


87.0% 


(2.26-2.20A) 


13.0% 


R^n in outer shell (2.26-2.20A) 




Rtym-II^ChXCh^/H I,(h), where I,(h) is the r* observation for reflection h 
and <I(h)> is the weighted mean of all the observations. 
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Table 2: Course of Model Budding 







nr. or 


nr. of side 




Rfactor 




CC 


cc te 




Model 


C*s 


chains 


(lO 4 *) 


{statistics using data between 8.0 and 3.0A} 


5 


mflL rcpL(mr) 

rigid body ref. (nnr) 
calculate NCS matrix 


331 


125 


- 


54.2 
52.6 


55.3 
52.9 


0.243 
0.287 


0.244 
0.318 




best monnmpr /Km\ 

rigid body ref 
update NCS matrix 


294 


228 




53.5 


55.0 


0.228 
0.320 


02)6 
0328 




bmcJL (mask I) 


373 


258 


10.8 


49.9 


51.3 


0.403 


0.424 




tzmsZCmask 1) 


405 


277 


10.8 


48.6 


48.4 


0.443 


0.478 




bmc3 (mask 2) 
rigid body ref. 
positional ref. (pbmc3) 
update NCS matrix 


411 


307 


9.99 


47.1 
46.9 
39.4 


48.6 
48.4 
44.7 


0.471 
0.476 
0.622 


0.491 
0.492 
0.562 


15 


bmc4 (mask 1) 


412 


327 


10.8 


41.7 


43.1 


0.584 


0.585 




bmc5 (mask 3) 


435 


387 


8.88 


39.8 


40.6 


0.621 


0.623 




bmc6 (mask 4) 


442 


413 


9.11 


38.4 


40.2 


0.647 


0.637 


20 
25 


Summary of the bootstrapping procedure. The resulting models have been listed chronologically starting 
with the molecular replacement solution, Le. mr (molecular replacement), bm (best monomer core) and 
the bootstrapping cycles bmel through bmc6. The following statistics are given for the various models 
the number of C- atoms built per monomer, the number of correct side chains incorporated per monomer 
and the volume of the molecular mask used during the averaging if applicable. The quality of each model 
is assessed using the R^ CC and CC fm calculated by X-PLOR for data between 8.0 and 3.0 A. 
After positional refinement of model bmc3, both monomers were made equivalent by taking one monomer 
and generating the non-crystal lographically related one. 



WO 97/1558* PCT/US96/17325 

-33- 

Table 3: Current Status of the Model 



Statistics for the data use<fl j n refinement: 






resolution (A) 


Kractor (%) 


completeness (%) 


fl A A 1 
o.i) - 4. J 


22.4 


85.7 


4.3 - 3.5 


19.0 


89.1 


J. 5 - 3.0 


20.6 


89.1 


3.0-2.8 


21.3 


87.9 


2.8-2.6 


22.3 


86.1 


2.6 - 2.4 


22.2 


84.0 


2.4 - 2.3 


22.7 


81.3 


2.3 - 2.2 


24.0 


78.3 


8.0 - 2.2. A 


21.3% 




model: 






molecules in the asymmetric unit: 




2 


residues (out of 904 possible): 




902 


sugars: 




6 


waters: 




296 



r.m.s.d. bond length (A): 
r.m.s.d. bond angles (°): 



average B-values for main chain atoms (A 2 ): 
side chain atoms (A 2 ): 



16.6 
18.3 
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Table 4 

Superposition of the proposed catalytic machinery of the serine carboxypeptidases with known three- 
dimensional structure PPCA, CPW (Liao et aL, Biochemistry J/:9796-98l2 (1992)) and CPY (Endrizzi 
et aL, Biochemistry JJ: 11 106-1 1120 (1994)). 



10 



15 



20 



25 



30 



PPCA 

Catalytic triad - 
Ser ISO 

His 429 



Asp 372 





CPW 




A PPCA-CPW 


CPY 




aPPCA-CPW 


N 


Serl46 


N 


(A) 


Scr 146 


N 


(A) 


C 




C* 


0.3 




c- 


0.4 


c 


His 397 


c 


0.4 


His 397 


C 


0.5 


0 




o 


0.3 




0 


0.4 


c* 




c» 


0.3 




o 


0.4 




Asp 338 


0* 


0.9 


Asp 338 




1.1 


N 




N 


1.5 


N 


0.9 


c- 




c- 


0.2 




c* 


0.4 


c 




c 


0.3 




c 


0.4 


0 




0 


0.3 




0 


0.5 


c 




a 


0.5 




a 


0.6 






o* 


0.3 




o* 


0.6 


d 




C" 
Ci 


0.3 
0.7 




c* 3 

C" 


0.5 
0.5 


N*' 




N" 


0.4 




N»' 


0.5 
0.4 






Nei 


0.3 




N* 2 


N 




N 


0.7 




N 


0.5 


c 




C« 


0.2 




c- 


0.2 


c 




C 


0.1 




c 


0.1 


o 




O 


0 1 




0 


0.1 






c» 


0.2 






0.1 


V 




o 


03 




o 


0.2 


o». 






0.2 






0.1 






0 *2 


0.2 
0.4 




O 43 


0.3 
0.1 



PrPPPggd PKYanion hole (formed bv two backbone amides!: 



Cry 57 
Tyrl5l 



N 
C* 

C 
0 
N 

o 

c 
o 



Gly53 
Tyrl47 



N 
C 

c 
o 

N 

c- 

c 
o 



0.1 
0.2 
0.1 
0.3 
0.3 
0.2 
0.3 
0.5 



Gly 53 
Tyr 147 



N 
C* 

C 

o 

N 

c* 

c 
o 



0.5 
0.4 
0.4 
0.8 
0.2 
0.1 
0.2 
0.2 



PrcPPXd regulation of oH dependent peptidase activity: 



Asn 55 averaged over all atoms Asn 5 1 
Glu 69 averaged over all atoms Gta 65 
Glu 149 averaged over ell atoms Ghi 145 



0.2 
0.3 
0.4 



Asn 51 
Glu 65 
Glu 145 



0.2 
0.7 
0.4 



The residues forming the proposed catalytic machinery are strictly conserved between the three serine 
carboxypeptidases. The deviation in distance between the atoms from PPCA and the equivalent atoms in 
CPW or CPY after superposition is given in Angstrom. 
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What Is Claimed Is: 

1. A method for crystallizing a human protective protein/cathepsin A (PPCA) or 
precursor human protective/cathepsin A protein (pPPCA). comprising 
(a) providing a purified PPCA or pPPCA; 
5 ( b > crystallizing the purified PPCA or pPPCA using a hanging drop or diffusion 

method, to provide crystallized PPCA or pPPCA having biological activity, 

wherein the crystallized PPCA or pPPCA is resolvable using x-ray crystallography to obtain 
x-ray dif&action patterns suitable for three-dimensional structure determination of the PPCA or 
pPPCA. 

10 2 - A me »hod according to claim 1, wherein said PPCA or pPPCA has at least one 

biological activity selected from the group consisting of enzyme protecting activity, enzyme 
modulating activity and peptide hydrolyzing activity. 

3. A method according to claim 1, wherein said crystallization step is done under 
conditions of purified PPCA or pPPCA; 2-30% PEG400- 10,000; precipitating salt; buffers, and pH 

15 7-9. 

4. A method according to claim 3, wherein the crystallization conditions are PPCA or 
pPPCA; 5-14% PEG8000, 40-80 mM tromethamine, 0.05-2.0 mM NaN 3 and pH 8.0-8.3. 

5. A crystallized PPCA or pPPCA, or at least one subdomain thereof, provided by a 
method according to claim 1. 

20 6 - A method for providing an atomic model of a PPCA or pPPCA, comprising 

(a) providing a computer readable medium having stored thereon atomic 
coordinate/x-ray diffraction data of said PPCA or pPPCA in crystalline form, said data sufficient to 
model the three-dimensional structure of said PPCA, said pPPCA, or at least one subdomain thereof; 

(b) analyzing, on a computer using at least one subroutine executed in said computer, 
25 the atomic coordinate/x-ray diffraction data from (a) to provide data output defining an atomic model 

of said PPCA or said pPPCA. said analyzing utilizing at least one computing algorithm selected from 
the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 
merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 
30 visualization, model building, rigid body refinement, positional refinement; and 

(c) obtaining atomic model output data defining the three-dimensional structure of 
said PPCA, pPPCA or at least one subdomain thereof. 

7. A method according to claim 6. wherein said computer readable medium further has 
stored thereon data corresponding to a nucleic acid sequence or an amino acid sequence data 
35 comprising at least one structural domain or a functional domain of a PPCA or pPPCA 
corresponding to a portion of the amino acid sequences of Figures 13 or 14, and wherein said 
analyzing step further comprises analyzing said sequence data. 
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8. A computer readable medium having stored thereon atomic model data of said PPC A 
or pPPCA as the model output data produced by a method according to claim 6. 

9. A computer-based system for providing atomic model data of the three dimensional 
structure of a PPCA or a pPPCA, comprising the following elements; 

5 (a) a computer readable medium having stored thereon atomic coordinate/x-ray 

diffraction data of said PPCA or pPPCA or at least one subdomain thereof; 

(b) at least one computing subroutine, that when executed in a computer, causes the 
computer to analyze the atomic coordinate/x-ray diffraction data from (a) to provide data output 
defining an atomic model of said PPCA or pPPCA, said analyzing utilising at least one computing 
10 subroutine selected from the group consisting of data processing and reduction, auto-indexing, 
intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, 
molecular alignment, molecular refinement, electron density map calculation, electron density 
modification, electron map visualization, model building, rigid body refinement, positional 
refinement; and 

15 (c) retrieval means for obtaining atomic model output data defining the three- 

dimensional structure of said PPCA, pPPCA or at least one subdomain thereof. 

10. A computer-based system according to claim 9, wherein said computer readable 
medium further has stored thereon data corresponding to a nucleic acid sequence or an amino acid 
sequence data comprising at least one structural domain or a functional domain of a PPCA or 

20 pPPCA corresponding to a portion of the amino acid sequences of Figures 1 3 or 1 4, and wherein said 
at least one subroutine further includes analyzing said sequence data. 

11. A computer readable medium, having stored thereon atomic model data of a PPCA, 
pPPCA, or at least one subdomain thereof, produced by a computer system according to claim 9. 

1 2. A method for providing an computer atomic model of a ligand of a PPCA or pPPCA, 
25 comprising 

(a) providing a computer readable medium according to claim 11, having stored 
thereon atomic model data of a PPCA, a pPPCA or at least one subdomain thereof; 

(b) providing a computer readable medium having stored thereon atomic model data 
sufficient to generate atomic models of potential ligands of PPCA or pPPCA; 

30 (c) analyzing on a computer, using at least one subroutine executed in said computer, 

the atomic model data from (a) and the ligand data from (b), to determine binding sites of PPCA or 
pPPCA and to provide data output defining an atomic model of a ligand of said PPCA, pPPCA, or 
at least one subdomain thereof, said analyzing utilizing computing subroutines selected from the 
group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 

35 merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 
visualization, model building, rigid body refinement, positional refinement; and 
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(d) obtaining atomic model output data defining the three-dimensional structure of 
a ligand of said PPCA, pPPCA or at least one subdomain thereof. 

13. A computer readable medium having stored thereon the model output data produced 
by a method according to claim 12. 
5 14. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 

atomic model of the ligand model produced by a method according to claim 12. 

15. A computer-based system for providing an atomic model of a ligand of a PPCA or 
pPPCA, comprising the following elements; 

(a) a computer readable medium having stored thereon atomic model data of a PPCA 

10 orpPPCA; 

(b) a computer readable medium having stored thereon atomic model data sufficient 
to generate atomic models of potential ligands of PPCA or pPPCA; 

(c) at least one computing subroutine for analyzing on a computer the atomic model 
data of PPCA or pPPCA from (a) and the ligand data from (b), to determine binding sites of PPCA 

15 or pPPCA and to provide data output defining a atomic models of potential ligands of PPCA or 
pPPCA, said analyzing utilizing at least one computing subroutine selected from the group consisting 
of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude 
conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron 
density map calculation, electron density modification, electron map visualization, model building, 

20 rigid body refinement, positional refinement; and 

(d) retrieval means for obtaining atomic model output data defining the atomic 
models of potential ligands of PPCA or pPPCA. 

1 6. A computer readable medium, comprising atomic model output data of a potential 
ligand of PPCA or pPPCA, said data produced by a method according to claim 15. 

25 1 7 - A" 1 isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 

atomic model of a ligand produced by a computer system according to claim 15. 

18. A crystallized pPPCA, having the atomic coordinates presented in Figure 23 .1-23.41. 
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